Method for interactive speech recognition and training

US 5,027,406 A
Filed: 12/06/1988
Issued: 06/25/1991
Est. Priority Date: 12/06/1988
Status: Expired due to Term

First Claim

Patent Images

1. A system for creating word models comprising:

means for making an acoustic model from one or more utterances of a word;

means for enabling a user to associate a sequence of textual characters with that acoustic model, said means including;

means for indicating to the user a menu of one or more sequences of textual characters;

means for enabling the user to select a given character sequence from the menu;

means for enabling the user to edit the selected character sequence to make it represent a different sequence of characters;

means for associating said edited character sequence with said acoustic model.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for creating word models for a large vocabulary, natural language dictation system. A user with limited typing skills can create documents with little or no advance training of word models. As the user is dictating, the user speaks a word which may or may not already be in the active vocabulary. The system displays a list of the words in the active vocabulary which best match the spoken word. By keyboard or voice command, the user may choose the correct word from the list or may choose to edit a similar word if the correct word is not on the list. Alternately, the user may type or speak the initial letters of the word. Then the recognition algorithm is called again satisfying the initial letters, and the choices displayed again. A word list is then also displayed from a large backup vocabulary. The best words to display from the backup vocabulary are chosen using a statistical language model and optionally word models derived from a phonemic dictionary. When the correct word is chosen by the user, the speech sample is used to create or update an acoustic model for the word, without further intervention by the user. As the system is used, it also constantly updates its statistical language model. The system gets more and more word models and keeps improving its performance the more it is used. The system may be used for connected speech as well as for discrete utterances.

649 Citations

48 Claims

1. A system for creating word models comprising:
- means for making an acoustic model from one or more utterances of a word;
  
  means for enabling a user to associate a sequence of textual characters with that acoustic model, said means including;
  
  means for indicating to the user a menu of one or more sequences of textual characters;
  
  means for enabling the user to select a given character sequence from the menu;
  
  means for enabling the user to edit the selected character sequence to make it represent a different sequence of characters;
  
  means for associating said edited character sequence with said acoustic model.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A speech recognition system including the system for creating word models recited in claim 1, wherein:
    - said speech recognition system further includes;
      
      means for making an acoustic description of a given portion of speech to be recognized;
      
      means for temporarily storing said acoustic description;
      
      means for storing acoustic models of the type referred to in claim 1 for each of a plurality of words;
      
      means for storing a sequence of textual characters in association with each acoustic model; and
      
      recognition means for selecting which one or more of said acoustic models best matches said acoustic description and for producing a list of those best matching acoustic models;
      
      said means for indicating a menu includes means for providing the user a menu of the character sequences associated with the best matching acoustic models on the list produced by the recognition means; and
      
      said means for making an acoustic model includes means for using as one of the utterances used in the making of said acoustic model said acoustic description used by said recognition means in selecting the characters sequences indicated in said menu.
  - 3. A speech recognition system including the system for creating word models recited in claim 1, wherein:
    - said system further includes;
      
      means for making an acoustic description of a menu selection command to be recognized;
      
      means for storing acoustic models of a plurality of menu selection commands each associated with one of the sequences of textual characters indicated by said menu; and
      
      recognition means for selecting which of said menu selection command models best matches said acoustic description of said menu selection command to be recognized; and
      
      said means for enabling the user to select a given character sequence from the menu includes means for using said means for making an acoustic description of said menu selection command to be recognized, said means for storing menu selection command models, and said recognition means to respond to the user'"'"'s speaking of a menu selection command by selecting the character sequence from the menu corresponding to the best matching menu selection command model selected by said recognition means.
  - 4. A speech recognition system including the system for creating word models recited in claim 1, wherein:
    - said system further includesmeans for making an acoustic description of an editing command to be recognized;
      
      means for storing acoustic models of a plurality of editing commands each associated with a function for editing a sequence of textual characters;
      
      recognition means for selecting which one or more of said editing command acoustic models best matches said acoustic description of said editing command to be recognized;
      
      said means for enabling the user to edit the selected character sequence includes means for using said means for making an acoustic description of said editing command to be recognized, said means for storing editing command acoustic models, and said recognition means to respond to the user'"'"'s speaking of an editing command by performing upon the selected character sequence the editing function corresponding to the best matching editing command acoustic model selected by said recognition means.
  - 5. A speech recognition system which includes the system for creating word models described in claim 1, said speech recognition system further including:
    - means for representing a body of text comprised of one or more words and for representing a word insertion location relative to said text;
      
      means for storing a plurality of acoustic models, each of which is associated with a word;
      
      recognition means for recognizing a spoken word by selecting one of said acoustic model which matches said spoken word; and
      
      means for inserting a representation of either the word associated with the acoustic model selected by said recognition means or said character sequence selected and edited by the user into said body of text at said word insertion location.

6. An adaptive speech recognition method for recognizing a plurality of spoken words over a period of time and for improving a set of acoustic models used during that recognition, said method comprising the steps of:
- storing a set of said acoustic models, with each said model being stored in association with a word label;
  
  forming an acoustic description of the sound of each of a succession of spoken words to be recognized substantially as each such word is spoken, each of said acoustic descriptions including acoustic data on the sound of the spoken word it describes;
  
  performing automatic speech recognition upon the acoustic descriptions of said succession of spoken words by comparing the acoustic description of each such spoken word against a plurality of said acoustic models to select, substantially as each such word is spoken, which one or more of said models best matches it, by indicating to a user of the system said one or more best matching model'"'"'s one or more associated word labels, and by responding to feedback from said user to select one of said indicated word labels, or another word label as being associated with said acoustic description;
  
  storing at least some of said acoustic descriptions in association with the word label associated with it by said speech recognition;
  
  updating the acoustic models associated with individual word labels in conjunction with which one or more of said acoustic descriptions have been stored by finding the one or more acoustic descriptions currently stored in association with that word label and by merging acoustic data from those one or more acoustic descriptions to make updated versions of those acoustic models;
  
  storing the updated acoustic models in said set of models and using those updated acoustic models in the subsequent performance of said automatic speech recognition upon successive ones of said spoken words.

7. A speech recognition system comprising:
- means for making an acoustic description of a given portion of speech to be recognized, as spoken by a given group of one or more speakers;
  
  means for storing a plurality of individually trained acoustic word models, each of which is associated with a given word, and each of which is derived from acoustic data produced by having one or more speakers from said given group speak one or more utterances of its associated word;
  
  means for storing a plurality of phonetic acoustic word models, each of which is associated with a given word, and none of which are derived from acoustic data produced by having any speakers from said given group speak its associated word;
  
  recognition means for comparing both said individually trained and said phonetic acoustic word models against said acoustic description of a given portion of speech to be recognized and for selecting which one or more of said models best match said acoustic description.

8. A speech recognition system comprising:
- means for making an acoustic description of a given portion of speech to be recognized;
  
  means for storing a first, acoustically selectable, set of machine responses, each of which is associated with a word;
  
  means for storing an acoustic word model of the word associated with each of said acoustically selectable machine responses;
  
  means for storing a second, non-acoustically selectable, set of machine responses, each of which is associated with a word;
  
  recognition means for selecting which one or more of said acoustic models best match said acoustic description;
  
  recognition indicating means for indicating to a user the corresponding one or more acoustically selectable machine responses associated with those best matching models;
  
  filtering means for selecting a subset of said non-acoustically selectable machine responses, said filtering means making said selection without performing a match between said acoustic description and any acoustic word models;
  
  filtering indicating means for indicating to the user the non-acoustically selectable machine responses selected by the filtering means;
  
  means for enabling a user to select one of the indicated acoustically selectable or non-acoustically selectable machine responses as a desired machine response.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. A speech recognition system as recited in claim 8, wherein said filtering means includes means for enabling the user to enter one or more textual characters, and for limiting the filtered subset to non-acoustically selectable machine responses whose associated words include the entered characters.
  - 10. A speech recognition system as recited in claim 8, wherein:
    - said system further includes language model means for indicating the probability that a given word to be recognized will be each of a plurality of words based on statistical information on the frequency of each such word'"'"'s use;
      
      said filtering means includes means for selecting said filtered subset of said non-acoustically selectable machine responses based on the probabilities of their associated words being spoken as indicated by said language model means.
  - 11. A speech recognition system as recited in claim 10, wherein:
    - said system is designed to recognize a word spoken in the context of one or more other words;
      
      said system further includes means for representing the language context of a word to be recognized;
      
      said language model means includes means for responding to said representation of the language context of the word to be recognized by indicating the probabilities that the word to be recognized will be each of a plurality of words based on statistical information on the frequency of each such word'"'"'s use in that language context.
  - 12. A speech recognition system as in claim 8, wherein said recognition indicating means and said filtering indicating means include means for indicating to the user which machine responses are selected by the recognition means and which are selected by the filtering means.
  - 13. A speech recognition system as in claim 8, wherein said system further includes means for responding to the selection of one of the indicated non-acoustically selectable machine responses as the desired machine response by making an acoustic word model for the word associated with that machine response using acoustic data from the acoustic description of the given portion of speech to be recognized, by making the selected machine response a member of the first, acoustically selectable, set of machine responses, and by storing the acoustic word model created for that machine response in said means for storing acoustic word models.
  - 14. A speech recognition system as in claim 8, wherein:
    - said means for making an acoustic description includes means for making an acoustic description of a response selection command to be recognized;
      
      said system further includes means for storing acoustic models of a plurality of response selection commands each associated with one of the machine responses indicated by said recognition indicating means or said filtering indicating means;
      
      said recognition means further includes means for selecting which one or more of said response selection command models best matches said acoustic description of said response selection command to be recognized; and
      
      said means for enabling the user to select one of said acoustically selectable or non-acoustically selectable machine responses includes means for using said means for making an acoustic description of said response selection command to be recognized, said means for storing response selection command models, and said recognition means to respond to the user'"'"'s speaking of a machine response selection command, by selecting the machine response corresponding to the best matching response selection command model selected by said recognition means.

15. A system for enabling a user to create word models for use in speech recognition comprising:
- means for storing a set of machine responses;
  
  means for enabling a user to enter filtering information which does not uniquely identify a desired machine response, but which does specify a subset of machine responses to which the desired machine response belongs; and
  
  filtering means for responding to the entry of such filtering information by selecting a subset of said machine responses which is limited to the subset specified by the filtering information,means for indicating to the user one or more machine responses from the subset selected by the filtering means;
  
  means for enabling the user to select which of the indicated machine responses is the response to be associated with a word model to be trained, without requiring the user to enter all the information contained in the machine response;
  
  means for making an acoustic description of a given portion of speech;
  
  means for incorporating data from that acoustic description into an acoustic word model associated with the selected machine response.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 16. A system as in claim 15, wherein;
    - said means for making an acoustic description include means for making an acoustic description of a given portion of speech to be recognized;
      
      said system further includes means for storing acoustic models of a plurality of words, each associated with one of said set of machine responses;
      
      said filtering means includes means for selecting from said acoustic models an active set of acoustic models which is limited to those acoustic models whose associated machine responses are within the subset specified by said filtering information entered by the user;
      
      said system also further includes recognition means for comparing said active set of acoustic models against said acoustic description to be recognized to select which one or more of said acoustic models in said active set best matches said acoustic description to be recognized; and
      
      said means for indicating to the user one or more machine responses from the subset selected by the filtering means includes means for indicating the one or more machine responses associated with the one or more acoustic models selected by said recognition means as best matching said acoustic description to be recognized.
  - 17. A system as in claim 15, wherein:
    - said means for making an acoustic description includes means for making an acoustic description of a response selection command to be recognized;
      
      said system further includes;
      
      means for storing acoustic models of a plurality of response selection commands each associated with one of the indicated machine responses; and
      
      recognition means for selecting which one or more of said machine response selection command models best matches said acoustic description of said response selection command to be recognized;
      
      said means for enabling the user to select an indicated machine response includes means for using said means for making an acoustic description of a response selection command to be recognized, said means for storing response selection command models, and said recognition means to respond to the user'"'"'s speaking of a response selection command by selecting the indicated machine response corresponding to the best matching response selection command model selected by said recognition means.
  - 18. A system as in claim 15, wherein:
    - said means for making an acoustic description includes means for making an acoustic description of a filtering command to be recognized;
      
      said system further includes;
      
      means for storing acoustic models of a plurality of filtering commands, each of which is associated with filtering information which specifies a subset of said set of machine responses; and
      
      recognition means for selecting which one or more of said filtering command models best matches said acoustic description of said filtering command to be recognized;
      
      said means for enabling the user to enter filtering information includes means for using said means for making an acoustic description of said filtering command to be recognized, said means for storing filtering command models, and said recognition means to respond to the user'"'"'s speaking of a filtering command by entering the filtering information associated with the best matching filtering command model selected by said recognition means.
  - 19. A system as in claim 15, wherein:
    - said machine responses each have associated with them a spelling comprised of a sequence of characters;
      
      said means for enabling the user to enter filtering information includes means for enabling the user to enter a string of one or more characters as said filtering information; and
      
      said filtering means includes means for selecting a subset of machine responses associated with spellings which contain the string of one or more characters entered by said user.
  - 20. A system as in claim 19, wherein:
    - said means for enabling the user to enter a string includes means for enabling the user to enter a string of one or more initial characters which specify the initial characters of the spellings of a desired subset of machine responses;
      
      said system further includes means for enabling the user to edit the string of initial characters once it has been entered;
      
      said filtering means includes means for responding to the editing of said string by selecting a subset of machine responses whose associated spellings start with the entered string of initial characters, as edited.
  - 21. A system as in claim 20, wherein:
    - said means for enabling the user to enter a string of one or more initial characters includes;
      
      means for indicating to the user one or more character strings;
      
      means for enabling the user to select one of the indicated character strings without having to enter all the characters in that string; and
      
      means for entering the selected character string as the string of one or more initial characters; and
      
      said means for enabling the user to edit the string of initial characters enables the user to edit the selected character string after it has been selected and entered.
  - 22. A system as in claim 21, wherein said means for indicating one or more character strings includes means for selecting said one or more character strings from a larger set of character string without comparing acoustic models associated with the selected strings against an acoustic description of speech to be recognized.
  - 23. A system as in claim 21, wherein:
    - said means for making an acoustic description includes means for making an acoustic description of a given portion of speech to be recognized; and
      
      said means for indicating one or more character strings includes;
      
      means for storing acoustic word models;
      
      means for storing a character string to be associated with each acoustic word model;
      
      recognition means for comparing an active set of said acoustic word models against said acoustic description to be recognized to select which one or more of said active set of acoustic models best match said acoustic description to be recognized; and
      
      means for indicating to the user as said character strings said one or more character strings associated with the acoustic models selected by said recognition means.
  - 24. A system as in claim 23, wherein said means for indicating one or more character strings further includes:
    - means for enabling the user to enter one or more first round filtering characters; and
      
      a first round filtering means for responding to the entry of the first round filtering characters by limiting the active set of acoustic word models used by said recognition means to a subset of said acoustic models whose associated character string contains the one or more first round filtering characters.
  - 25. A speech recognition system which includes the system for enabling a user to create word models described in claim 15, said speech recognition system further including:
    - means for representing a body of text comprised of one or more words and for representing a word insertion location relative to said text;
      
      means for storing a word in association with each of said machine responses;
      
      recognition means for recognizing a spoken word by selecting a word which matches said spoken word; and
      
      means for inserting a representation of either the word selected by said recognition means or the word associated with the indicated machine response selected by the user into said body of text at said word insertion location.

26. A system for enabling a user to create word models for use in speech recognition comprising:
- means for storing a set of machine responses;
  
  means for storing a word in association with each of said machine responses;
  
  language model means for indicating the probability that a given word to be trained will be each of a plurality of said stored words based on statistical information on the frequency of each such stored word'"'"'s use;
  
  filtering means for selecting a subset of said machine responses based on the probabilities, indicated by said language model means, of the stored word associated with each such machine response;
  
  means for indicating to the user the one or more machine responses selected by the filtering means;
  
  means for enabling the user to select which of the indicated machine responses is the response to be associated with a word model to be trained, without requiring the user to enter all the information contained in the machine response;
  
  means for making an acoustic description of a given portion of speech spoken by the user;
  
  means for incorporating data from that acoustic description into an acoustic model associated with the selected machine response.
- View Dependent Claims (27, 28, 29, 30)
- - 27. A system as in claim 26, wherein:
    - each machine response has associated with it a spelling comprised of a sequence of characters;
      
      said system further includes means, responsive to the selection by the user of one of said indicated machine responses, for enabling the user to edit the sequence of characters associated with said selected machine response; and
      
      said means for incorporating data from an acoustic description into an acoustic model associated with the selected machine response, includes means for associating the acoustic model with the edited sequence of characters.
  - 28. A system as in claim 26, wherein:
    - said system further includes means for representing a current language context, said current language context including a body of text containing one or more words and an insertion location representing a particular location relative to that body of text where a word could be inserted;
      
      said language model means includes means for responding to said representation of the current language context by indicating the probabilities that a word to be inserted at said insertion location would be each of a plurality of words based on statistical information on the frequency of each such word'"'"'s use in the current language context; and
      
      said filtering means includes means for selecting said subset of machine responses based on the probabilities, indicated by said language model means, of the word associated with each such machine response given said current language context.
  - 29. A speech recognition system which includes the system for enabling a user to create word models described in claim 28, wherein:
    - said speech recognition system includes recognition means for recognizing a word, spoken in the current language context represented by said means for representing a language context, by selecting a word which matches said spoken word;
      
      means for inserting a representation of either the word selected by said recognition means or the word associated with the indicated machine response selected by the user into said body of text at said insertion location.
  - 30. A system as in claim 26, wherein:
    - said means for making an acoustic description includes means for making an acoustic description of a response selection command to be recognized;
      
      said system further includesmeans for storing acoustic models of a plurality of response selection commands each associated with one of said indicated machine responses; and
      
      recognition means for selecting which one or more of said response selection command models best matches said acoustic description of said response selection command to be recognized; and
      
      said means for enabling the user to select an indicated machine response includes means for using said means for making an acoustic description of said response selection command, said means for storing response selection command models, and said recognition means to respond to the user'"'"'s speaking of a response selection command by selecting the indicated machine response corresponding to the best matching response selection command model selected by said recognition means.

31. A continuous speech recognition system comprising:
- means for making an acoustic description of a given portion of speech to be recognized;
  
  means for storing acoustic models of a plurality of words;
  
  recognition means for matching sequences of two or more acoustic word models against said portion of speech and for selecting a plurality of the best matching word sequences, each representing a sequence of two or more words whose corresponding sequence of two or more acoustic models provides one of the best matches against said acoustic description;
  
  means for indicating to a user each of said plurality of best matching word sequences; and
  
  means for enabling the user to select one of the indicated best matching word sequences for use as an output, without requiring the user to enter each word in the selected sequence.
- View Dependent Claims (32)
- - 32. A continuous speech recognition system as in claim 31, wherein:
    - said means for making an acoustic description includes means for making an acoustic description of a word sequence selection command to be recognized;
      
      said system further includes means for storing acoustic models of a plurality of word sequence selection commands each associated with one of the indicated best matching word sequences;
      
      said recognition means further includes means for selecting which one or more of said word sequence selection command models best matches said acoustic description of said word sequence selection command to be recognized;
      
      said means for enabling the user to select one of the indicated best matching word sequences includes means for using said means for making an acoustic description of said word sequence selection command to be recognized, said means for storing word sequence selection commands, and said recognition means to respond to the user'"'"'s speaking of a word sequence selection command by selecting that one of the indicated best matching word sequences which corresponds to the best matching word sequence selection command model selected by said recognition means.

33. A continuous speech recognition system comprising:
- means for making an acoustic description of a given portion of speech to be recognized;
  
  means for storing acoustic models of a plurality of words;
  
  recognition means for matching sequences of acoustic word models against said acoustic description and for selecting a best matching word sequence, representing a sequence of words whose sequence of corresponding acoustic models provide one of the best matches against said acoustic description;
  
  means for indicating the words of said best matching word sequence to a user;
  
  means for enabling the user to select an individual word from the indicated best matching sequence of words;
  
  means for enabling the user to correct the indicated best matching word sequence by correcting the selected word;
  
  means for using the indicated best matching word sequence, with the corrected selected word as an output.
- View Dependent Claims (34, 35, 36, 37, 38, 39, 40)
- - 34. A continuous speech recognition system as in claim 33, wherein:
    - said means for making an acoustic description includes means for making an acoustic description of a word selection command to be recognized;
      
      said system further includes means for storing acoustic models of a plurality of word selection commands each associated with one of the indicated words of the best matching word sequence;
      
      said recognition means further includes means for selecting which one or more of said word selection command models best matches said acoustic description of said word selection command to be recognized;
      
      said means for enabling the user to select an individual word from the indicated best matching sequence includes means for using said means for making an acoustic description of said word selection command to be recognized, said means for storing word selection commands, and said recognition means to respond to the user'"'"'s speaking of a word selection command by selecting that one of the indicated words of said best matching word sequence which corresponds to the best matching word selection command model selected by said recognition means.
  - 35. A continuous speech recognition system as in claim 33, wherein said means for enabling the user to correct the best matching word sequence includes:
    - means for indicating to the user a plurality of alternate choice words for replacing the word selected by the user for correction;
      
      means for enabling the user to select one of said alternate choice words;
      
      means for using the selected alternate choice word to replace the word selected for correction in said best matching word sequence.
  - 36. A continuous speech recognition system as in claim 35, wherein:
    - said recognition means includes means for matching said sequences of acoustic word models against said acoustic description in a manner which associates individual acoustic word models of said best matching word sequence with corresponding individual portions of said acoustic description of said given portion of speech to be recognized;
      
      said recognition means further includes means for comparing individual word models against a given portion of said acoustic description to select a plurality of words whose corresponding acoustic models best match said given portion;
      
      said means for indicating a plurality of alternate choice words includes means for selecting as said indicated alternate choice words words which said recognition means has selected as having acoustic models which best match the individual portion of said acoustic description corresponding to the individual word selected by the user for correction.
  - 37. A continuous speech recognition system as in claim 36, wherein:
    - said system further includes means for enabling the user to enter filtering information which specifies a subset of words;
      
      said recognition mean'"'"'s means for comparing individual word models against a given portion of said acoustic description includes means for increasing the probability that each of the plurality of words which it selects as a result of that comparison will belong to the subset of words specified by the filtering information.
  - 38. A continuous speech recognition system as in claim 35, wherein:
    - said system further includes means for enabling the user to enter filtering information which specifies a subset of words;
      
      said means for indicating a plurality of alternate choice words includes means for selecting certain alternate choice words which fall within the subset of words specified by said filtering information independently without comparing any acoustic model of those certain alternate choice words against said acoustic description of speech to be recognized.
  - 39. A continuous speech recognition system as in claim 33, wherein:
    - said means for enabling the user to correct the best matching word sequence by correcting the selected word includes;
      
      means for enabling the user to correct the selected word by replacing it with a corrected word;
      
      rerecognition means for matching a plurality of sequences of one or more acoustic word models against at least a portion of the acoustic description of the given portion of speech to be recognized and for selecting one or more such word sequences which best match said acoustic description, said recognition means including means for enabling information about the corrected word to alter the outcome of the selection performed by said rerecognition means; and
      
      means for using one of the word sequences selected by said rerecognition means as at least part of said corrected best matching word sequence.
  - 40. A continuous speech recognition system as in claim 39, wherein:
    - said system further includes means for storing an acoustic model of said corrected word; and
      
      said rerecognition means includes means for time aligning the acoustic model of the corrected word against a portion of said acoustic description to be recognized and for using that time alignment to determine the boundary in said acoustic description of a word to be selected by said rerecognition means.

41. A speech recognition system comprising:
- means for storing a list of machine responses, each of which has associated with it a spelling comprised of a sequence of characters and a pronunciation;
  
  means for storing an acoustic model for the pronunciation associated with each of the machine responses;
  
  means for making an acoustic description of a portion of speech to be recognized;
  
  means for enabling a user to enter a string of one or more characters as filtering information;
  
  means for enabling the user to edit said string of characters once it has been entered;
  
  filtering means for responding to the entry and editing of said string by selecting a subset of machine responses associated with spellings which contain the string of one or more characters as entered and edited by said user; and
  
  recognition means for making a filtered selection of which one or more of said acoustic models best match said acoustic description of said portion of speech to be recognized, including means for causing the selection by said recognition means to favor the selection of acoustic models whose associated machine responses are in said subset selected by said filtering means.
- View Dependent Claims (42, 43, 44, 45)
- - 42. A speech recognition system as in claim 41, wherein:
    - said means for enabling the user to enter a string of one or more characters includes;
      
      means for indicating to the user one or more character strings;
      
      means for enabling the user to select one of the indicated character strings without having to enter all the characters in that string; and
      
      means for using the selected character string as said string of one or more characters entered by the user; and
      
      said means for enabling the user to edit the string of initial characters enables the user to edit such a selected character string after it has been selected.
  - 43. A speech recognition system as in claim 42, wherein:
    - said recognition means includes means for performing a first selection of which one or more of said acoustic models best match said acoustic description of said portion of speech to be recognized;
      
      said means for indicating one or more character strings includes means for indicating as such character strings the spellings of each of the machine responses associated with the one or more best matching acoustic models selected by said recognition means in said first selection; and
      
      said recognition means performs said filtered selection as a rerecognition of said portion of speech to be recognized after the following;
      
      said first selection, said indicating of the spellings selected by said first selection, said selection of one of the indicated character strings, its editing, and the selection of said subset by said filtering means.
  - 44. A speech recognition system as in claim 41, wherein:
    - said means for making an acoustic description includes means for making an acoustic description of a string entry command to be recognized;
      
      said system further includes means for storing acoustic models of a plurality of string entry commands each associated with a command to enter a string of the one or more characters as filtering information;
      
      said recognition means further includes means for selecting which one or more of said string entry commands models best matches said acoustic description of said menu selection portion of speech;
      
      said means for enabling the user to enter said string of one or more characters includes means for using said means for making an acoustic description of said string entry command to be recognized, said means for storing string entry command models, and said recognition means to respond to the user'"'"'s speaking of a string entry command by entering the string of one or more characters corresponding to the best matching string entry model selected by said recognition means.
  - 45. A speech recognition system as in claim 41, wherein:
    - said means for making an acoustic description includes means for making an acoustic description of an editing command to be recognized;
      
      said system further includes means for storing acoustic models of a plurality of editing commands each associated with a function for editing said string of one or more characters entered as filtering information;
      
      said recognition means further includes means for selecting which one or more of said editing command models best matches said acoustic description of said editing command portion of speech;
      
      said means for enabling the user to edit the selected character sequence includes means for using said means for making an acoustic description of said editing command to be recognized, said means for storing editing command models, and said recognition means to respond to the user'"'"'s speaking of an editing command spoken by the user, by performing upon the selected character sequence the editing function corresponding to the best matching editing command model selected by said recognition means.

46. A speech recognition system designed to recognized a series of spoken words, said system comprising:
- language model means for indicating the probability that a given word to be recognized will be each of a plurality of vocabulary words based on statistical information on the frequency of that word'"'"'s use;
  
  means for making an acoustic description of the utterance of each of said series of spoken words to be recognized;
  
  means for storing acoustic models of a plurality of vocabulary words;
  
  recognition means for selecting which one or more of said vocabulary word'"'"'s acoustic models best match a given acoustic description of a word to be recognized, based both on the closeness of the match between said acoustic models and said given acoustic description and on the probability indications by said language model means;
  
  means for using the selection by said recognition means of one or more vocabulary words as best matching said given acoustic description to update the statistical information on the frequency of said one or more vocabulary words in said language model means and for causing said recognition means to use a probability based on said updated statistical information in the recognition of which vocabulary words best match acoustic descriptions of subsequent words.
- View Dependent Claims (47)
- - 47. A speech recognition system as in claim 46, wherein:
    - said system is designed to recognize a word spoken in the context of one or more other words;
      
      said system further includes means for representing the language context of the word to be recognized;
      
      said language model means includes means for responding to said representation of the language context of the word to be recognized by indicating the probabilities that the word to be recognized will be each of a plurality of words based on statistical information on the frequency of each word'"'"'s use in that language context; and
      
      said means for using the selection of a given vocabulary word as best matching said given acoustic description includes means for using said selection to update statistical information on the frequency of said given vocabulary word in the language context in which that word was recognized.

48. An adaptive speech recognition method for recognizing each of a succession of spoken words over a period of time and for improving a set of acoustic models used during that recognition, said method comprising the steps of:
- storing a set of said acoustic models, with each such model being stored in association with at least one word label;
  
  forming an acoustic description of successive portions of sound containing the spoken words to be recognized, and doing so substantially as such words are spoken;
  
  performing automatic speech recognition upon the acoustic description of successive portions of sound as the successive words are being spoken by comparing a plurality of said acoustic models against successive portions of the acoustic description to select which one or more of said models best match successive portions of said acoustic description and to associate those one or more best matching models'"'"' associated word labels with the portion of the acoustic description which those models match;
  
  indicating to the user, as the successive words are being spoken, the one or more word labels associated with each of successive portions of said acoustic descriptions by said recognition;
  
  providing means for enabling a user to respond, during the speaking of said succession of words, to said indication of word labels by selecting individual word labels to be associated with certain portions of said acoustic description;
  
  in response to such selections, recalculating acoustic models associated with said word labels selected by the user, using acoustic information from the portions of the acoustic descriptions selected by the user as being associated with those word labels; and
  
  using the recalculated acoustic models in the recognition of successive ones of said spoken words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Dragon Systems, Inc. (Microsoft Corporation)
Inventors
Roberts, Jed, Baker, James K., Porter, Edward W.
Primary Examiner(s)
Shaw, Dale M.
Assistant Examiner(s)
Merecki, John A.

Application Number

US07/280,700
Time in Patent Office

931 Days
Field of Search

381/41-46, 381/110, 364/513.5
US Class Current

704/244
CPC Class Codes

G10L 15/063 Training

G10L 15/22 Procedures used during a sp...

Method for interactive speech recognition and training

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

649 Citations

48 Claims

Specification

Solutions

Use Cases

Quick Links

Method for interactive speech recognition and training

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

649 Citations

48 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links