Method for interactive speech recognition and training
First Claim
1. A system for creating word models comprising:
- means for making an acoustic model from one or more utterances of a word;
means for enabling a user to associate a sequence of textual characters with that acoustic model, said means including;
means for indicating to the user a menu of one or more sequences of textual characters;
means for enabling the user to select a given character sequence from the menu;
means for enabling the user to edit the selected character sequence to make it represent a different sequence of characters;
means for associating said edited character sequence with said acoustic model.
8 Assignments
0 Petitions
Accused Products
Abstract
A method for creating word models for a large vocabulary, natural language dictation system. A user with limited typing skills can create documents with little or no advance training of word models. As the user is dictating, the user speaks a word which may or may not already be in the active vocabulary. The system displays a list of the words in the active vocabulary which best match the spoken word. By keyboard or voice command, the user may choose the correct word from the list or may choose to edit a similar word if the correct word is not on the list. Alternately, the user may type or speak the initial letters of the word. Then the recognition algorithm is called again satisfying the initial letters, and the choices displayed again. A word list is then also displayed from a large backup vocabulary. The best words to display from the backup vocabulary are chosen using a statistical language model and optionally word models derived from a phonemic dictionary. When the correct word is chosen by the user, the speech sample is used to create or update an acoustic model for the word, without further intervention by the user. As the system is used, it also constantly updates its statistical language model. The system gets more and more word models and keeps improving its performance the more it is used. The system may be used for connected speech as well as for discrete utterances.
649 Citations
48 Claims
-
1. A system for creating word models comprising:
-
means for making an acoustic model from one or more utterances of a word; means for enabling a user to associate a sequence of textual characters with that acoustic model, said means including; means for indicating to the user a menu of one or more sequences of textual characters; means for enabling the user to select a given character sequence from the menu; means for enabling the user to edit the selected character sequence to make it represent a different sequence of characters; means for associating said edited character sequence with said acoustic model. - View Dependent Claims (2, 3, 4, 5)
-
-
6. An adaptive speech recognition method for recognizing a plurality of spoken words over a period of time and for improving a set of acoustic models used during that recognition, said method comprising the steps of:
-
storing a set of said acoustic models, with each said model being stored in association with a word label; forming an acoustic description of the sound of each of a succession of spoken words to be recognized substantially as each such word is spoken, each of said acoustic descriptions including acoustic data on the sound of the spoken word it describes; performing automatic speech recognition upon the acoustic descriptions of said succession of spoken words by comparing the acoustic description of each such spoken word against a plurality of said acoustic models to select, substantially as each such word is spoken, which one or more of said models best matches it, by indicating to a user of the system said one or more best matching model'"'"'s one or more associated word labels, and by responding to feedback from said user to select one of said indicated word labels, or another word label as being associated with said acoustic description; storing at least some of said acoustic descriptions in association with the word label associated with it by said speech recognition; updating the acoustic models associated with individual word labels in conjunction with which one or more of said acoustic descriptions have been stored by finding the one or more acoustic descriptions currently stored in association with that word label and by merging acoustic data from those one or more acoustic descriptions to make updated versions of those acoustic models; storing the updated acoustic models in said set of models and using those updated acoustic models in the subsequent performance of said automatic speech recognition upon successive ones of said spoken words.
-
-
7. A speech recognition system comprising:
-
means for making an acoustic description of a given portion of speech to be recognized, as spoken by a given group of one or more speakers; means for storing a plurality of individually trained acoustic word models, each of which is associated with a given word, and each of which is derived from acoustic data produced by having one or more speakers from said given group speak one or more utterances of its associated word; means for storing a plurality of phonetic acoustic word models, each of which is associated with a given word, and none of which are derived from acoustic data produced by having any speakers from said given group speak its associated word; recognition means for comparing both said individually trained and said phonetic acoustic word models against said acoustic description of a given portion of speech to be recognized and for selecting which one or more of said models best match said acoustic description.
-
-
8. A speech recognition system comprising:
-
means for making an acoustic description of a given portion of speech to be recognized; means for storing a first, acoustically selectable, set of machine responses, each of which is associated with a word; means for storing an acoustic word model of the word associated with each of said acoustically selectable machine responses; means for storing a second, non-acoustically selectable, set of machine responses, each of which is associated with a word; recognition means for selecting which one or more of said acoustic models best match said acoustic description; recognition indicating means for indicating to a user the corresponding one or more acoustically selectable machine responses associated with those best matching models; filtering means for selecting a subset of said non-acoustically selectable machine responses, said filtering means making said selection without performing a match between said acoustic description and any acoustic word models; filtering indicating means for indicating to the user the non-acoustically selectable machine responses selected by the filtering means; means for enabling a user to select one of the indicated acoustically selectable or non-acoustically selectable machine responses as a desired machine response. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system for enabling a user to create word models for use in speech recognition comprising:
-
means for storing a set of machine responses; means for enabling a user to enter filtering information which does not uniquely identify a desired machine response, but which does specify a subset of machine responses to which the desired machine response belongs; and filtering means for responding to the entry of such filtering information by selecting a subset of said machine responses which is limited to the subset specified by the filtering information, means for indicating to the user one or more machine responses from the subset selected by the filtering means; means for enabling the user to select which of the indicated machine responses is the response to be associated with a word model to be trained, without requiring the user to enter all the information contained in the machine response; means for making an acoustic description of a given portion of speech; means for incorporating data from that acoustic description into an acoustic word model associated with the selected machine response. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A system for enabling a user to create word models for use in speech recognition comprising:
-
means for storing a set of machine responses; means for storing a word in association with each of said machine responses; language model means for indicating the probability that a given word to be trained will be each of a plurality of said stored words based on statistical information on the frequency of each such stored word'"'"'s use; filtering means for selecting a subset of said machine responses based on the probabilities, indicated by said language model means, of the stored word associated with each such machine response; means for indicating to the user the one or more machine responses selected by the filtering means; means for enabling the user to select which of the indicated machine responses is the response to be associated with a word model to be trained, without requiring the user to enter all the information contained in the machine response; means for making an acoustic description of a given portion of speech spoken by the user; means for incorporating data from that acoustic description into an acoustic model associated with the selected machine response. - View Dependent Claims (27, 28, 29, 30)
-
-
31. A continuous speech recognition system comprising:
-
means for making an acoustic description of a given portion of speech to be recognized; means for storing acoustic models of a plurality of words; recognition means for matching sequences of two or more acoustic word models against said portion of speech and for selecting a plurality of the best matching word sequences, each representing a sequence of two or more words whose corresponding sequence of two or more acoustic models provides one of the best matches against said acoustic description; means for indicating to a user each of said plurality of best matching word sequences; and means for enabling the user to select one of the indicated best matching word sequences for use as an output, without requiring the user to enter each word in the selected sequence. - View Dependent Claims (32)
-
-
33. A continuous speech recognition system comprising:
-
means for making an acoustic description of a given portion of speech to be recognized; means for storing acoustic models of a plurality of words; recognition means for matching sequences of acoustic word models against said acoustic description and for selecting a best matching word sequence, representing a sequence of words whose sequence of corresponding acoustic models provide one of the best matches against said acoustic description; means for indicating the words of said best matching word sequence to a user; means for enabling the user to select an individual word from the indicated best matching sequence of words; means for enabling the user to correct the indicated best matching word sequence by correcting the selected word; means for using the indicated best matching word sequence, with the corrected selected word as an output. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40)
-
-
41. A speech recognition system comprising:
-
means for storing a list of machine responses, each of which has associated with it a spelling comprised of a sequence of characters and a pronunciation; means for storing an acoustic model for the pronunciation associated with each of the machine responses; means for making an acoustic description of a portion of speech to be recognized; means for enabling a user to enter a string of one or more characters as filtering information; means for enabling the user to edit said string of characters once it has been entered; filtering means for responding to the entry and editing of said string by selecting a subset of machine responses associated with spellings which contain the string of one or more characters as entered and edited by said user; and recognition means for making a filtered selection of which one or more of said acoustic models best match said acoustic description of said portion of speech to be recognized, including means for causing the selection by said recognition means to favor the selection of acoustic models whose associated machine responses are in said subset selected by said filtering means. - View Dependent Claims (42, 43, 44, 45)
-
-
46. A speech recognition system designed to recognized a series of spoken words, said system comprising:
-
language model means for indicating the probability that a given word to be recognized will be each of a plurality of vocabulary words based on statistical information on the frequency of that word'"'"'s use; means for making an acoustic description of the utterance of each of said series of spoken words to be recognized; means for storing acoustic models of a plurality of vocabulary words; recognition means for selecting which one or more of said vocabulary word'"'"'s acoustic models best match a given acoustic description of a word to be recognized, based both on the closeness of the match between said acoustic models and said given acoustic description and on the probability indications by said language model means; means for using the selection by said recognition means of one or more vocabulary words as best matching said given acoustic description to update the statistical information on the frequency of said one or more vocabulary words in said language model means and for causing said recognition means to use a probability based on said updated statistical information in the recognition of which vocabulary words best match acoustic descriptions of subsequent words. - View Dependent Claims (47)
-
-
48. An adaptive speech recognition method for recognizing each of a succession of spoken words over a period of time and for improving a set of acoustic models used during that recognition, said method comprising the steps of:
-
storing a set of said acoustic models, with each such model being stored in association with at least one word label; forming an acoustic description of successive portions of sound containing the spoken words to be recognized, and doing so substantially as such words are spoken; performing automatic speech recognition upon the acoustic description of successive portions of sound as the successive words are being spoken by comparing a plurality of said acoustic models against successive portions of the acoustic description to select which one or more of said models best match successive portions of said acoustic description and to associate those one or more best matching models'"'"' associated word labels with the portion of the acoustic description which those models match; indicating to the user, as the successive words are being spoken, the one or more word labels associated with each of successive portions of said acoustic descriptions by said recognition; providing means for enabling a user to respond, during the speaking of said succession of words, to said indication of word labels by selecting individual word labels to be associated with certain portions of said acoustic description; in response to such selections, recalculating acoustic models associated with said word labels selected by the user, using acoustic information from the portions of the acoustic descriptions selected by the user as being associated with those word labels; and using the recalculated acoustic models in the recognition of successive ones of said spoken words.
-
Specification