Method for efficient, safe and reliable data entry by voice under adverse conditions
First Claim
1. A method of data entry by voice under adverse conditions for efficient and robust form filling, the method comprising:
- communicating an input utterance from a speaker to a speech recognition means;
spotting a plurality of spotted words of at least two recognized spoken words within the input utterance, wherein the spotted words form a phrase containing at least one of field-specific values or commands;
obtaining blocks of text from input utterances separated from other input utterances by natural speech pauses, exactly one block of text per input utterance;
determining whether a block of text contains recognized values as opposed to commands;
populating a license plate number field of a form by concatenating, based on a sequence of input utterances from which the blocks of text are obtained, blocks of text determined to contain recognized values;
echoing blocks of text back to the speaker via a text-to-speech system, wherein audio feedback echoing the blocks of text is performed upon interpretation of each input utterance, and a sequence of recognized values echoed in the audio feedback reflects a sequence of spotted words within the input utterance from which the recognized values are obtained;
rejecting unreliable or unsafe inputs for which a confidence measure is found to be low; and
maintaining a dialogue history enabling editing operations and correction operations on all active fields.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for data entry by voice under adverse conditions is disclosed. More specifically it provides a way for efficient and robust form filling by voice. A form can typically contain one or several fields that must be filled in. The user communicates to a speech recognition system and word spotting is performed upon the utterance. The spotted words of an utterance form a phrase that can contain field-specific values and/or commands. Recognized values are echoed back to the speaker via a text-to-speech system. Unreliable or unsafe inputs for which the confidence measure is found to be low (e.g. ill-pronounced speech or noises) are rejected by the spotter. Speaker adaptation is furthermore performed transparently to improve speech recognition accuracy. Other input modalities can be additionally supported (e.g. keyboard and touch-screen). The system maintains a dialogue history to enable editing and correction operations on all active fields.
-
Citations
23 Claims
-
1. A method of data entry by voice under adverse conditions for efficient and robust form filling, the method comprising:
-
communicating an input utterance from a speaker to a speech recognition means; spotting a plurality of spotted words of at least two recognized spoken words within the input utterance, wherein the spotted words form a phrase containing at least one of field-specific values or commands; obtaining blocks of text from input utterances separated from other input utterances by natural speech pauses, exactly one block of text per input utterance; determining whether a block of text contains recognized values as opposed to commands; populating a license plate number field of a form by concatenating, based on a sequence of input utterances from which the blocks of text are obtained, blocks of text determined to contain recognized values; echoing blocks of text back to the speaker via a text-to-speech system, wherein audio feedback echoing the blocks of text is performed upon interpretation of each input utterance, and a sequence of recognized values echoed in the audio feedback reflects a sequence of spotted words within the input utterance from which the recognized values are obtained; rejecting unreliable or unsafe inputs for which a confidence measure is found to be low; and maintaining a dialogue history enabling editing operations and correction operations on all active fields. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An article of manufacture for data entry by voice under adverse conditions enabling efficient and robust form filling, the article of manufacture comprising:
-
an operating system; a memory in communication with said operating system; a speech recognition means in communication with said operating system; a speech generation means in communication with said operating system; and a dialogue history maintenance means in communication with said operating system, wherein said operating system manages said memory, said speech recognition means, said speech generation means, and said dialogue history maintenance means in a manner permitting the user to monitor speech recognition of an input utterance by means of a generated speech corresponding to at least one of field-specific values or commands contained within the phrase formed by spotted words within the input utterance, and to perform editing operations and correction operations on all active fields, wherein audio feedback echoing at least one of recognized values or recognized commands is performed upon interpretation of each input utterance, and a sequence of recognized values echoed in the audio feedback reflects a sequence of spotted words within the input utterance from which the recognized values are obtained, said article of manufacture further comprising dialogue management means for using a dialogue model that provides feedback to the speaker of each recognized block of text obtained from an input utterance separated from other input utterances by natural speech pauses, exactly one block of text per input utterance, thereby affording the speaker an opportunity to correct recognition errors by;
(a) speaking a command operable to designate a particular one of plural recognized blocks of text in a license plate field for replacement; and
(b) providing a subsequent input utterance containing field specific values to replace the particular recognized block of text in the license plate field without replacing at least one other recognized block of text in the license plate field. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
Specification