Speech recognition apparatus and method

US 4,783,803 A
Filed: 11/12/1985
Issued: 11/08/1988
Est. Priority Date: 11/12/1985
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition system for recognizing a word corresponding to a given utterance spoken after one or more preceding words, said system comprising:

means for storing an acoustic model of a given word;

means for making a word match score corresponding to the probability that said given word corresponds to acoustic data generated by said utterance, including means for making successive partial comparisons between said acoustic model and said acoustic data and for successively updating said word match score after each such partial comparison;

context storing means for storing a language context derived from one or more words spoken prior to said given utterance;

language score introducing means, including means for generating a language score for said given word corresponding to a language model probability estimate of the probability that said given word would occur given the language context stored in said context storing means and means for separately altering said word match score after each of a plurality of said partial comparisons by an amount corresponding to said language score; and

means for stopping said means for making a word match score from making further partial comparisons between said acoustic model of a given word and said acoustic data when the word match score for that given word is worse than a given threshold.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system is disclosed for recognizing a pattern in a collection of data given a context of one or more other patterns previously identified. Preferably the system is a speech recognition system, the patterns are words and the collection of data is a sequence of acoustic frames. During the processing of each of a plurality of frames, for each word in an active vocabulary, the system updates a likelihood score representing a probability of a match between the word and the frame, combines a language model score based on one or more previously recognized words with that likelihood score, and prunes the word from the active vocabulary if the combined score is below a threshold. A rapid match is made between the frames and each word of an initial vocabulary to determine which words should originally be placed in the active vocabulary. Preferably the system enables an operator to confirm the system'"'"'s best guess as to the spoken word merely by speaking another word, to indicate that an alternate guess by the system is correct by typing a key associated with that guess, and to indicate that neither the best guess nor the alternate guesses was correct by typing yet another key. The system includes other features, including ones for determining where among the frames to look for the start of speech, and a special hardware processor for computing likelihood scores.

Citations

127 Claims

1. A speech recognition system for recognizing a word corresponding to a given utterance spoken after one or more preceding words, said system comprising:
- means for storing an acoustic model of a given word;
  
  means for making a word match score corresponding to the probability that said given word corresponds to acoustic data generated by said utterance, including means for making successive partial comparisons between said acoustic model and said acoustic data and for successively updating said word match score after each such partial comparison;
  
  context storing means for storing a language context derived from one or more words spoken prior to said given utterance;
  
  language score introducing means, including means for generating a language score for said given word corresponding to a language model probability estimate of the probability that said given word would occur given the language context stored in said context storing means and means for separately altering said word match score after each of a plurality of said partial comparisons by an amount corresponding to said language score; and
  
  means for stopping said means for making a word match score from making further partial comparisons between said acoustic model of a given word and said acoustic data when the word match score for that given word is worse than a given threshold.
- View Dependent Claims (2, 3)
- - 2. A speech recognition system as described in claim 1, wherein said language score introducing means means for altering said word match score includes means for altering said match score after each of a given number of said partial comparisons and for causing the total amount by which said language score introducing means alters said word match score over all of said given number of partial comparisons to substantially equal the value of said language score.
  - 3. A speech recognition system as described in claim 1, wherein said word match score is a logarithmic function of an estimate of the probability that said given word corresponds to the acoustic data generated by said utterance and said language score is a logarithmic function of said language model probability estimate, and said means for altering said word match score after each of a plurality of partial comparisons includes means for adding a fraction of said language score to said word match score after each of said plurality of partial comparisons.

4. A speech recognition method for recognizing a word corresponding to a given utterance spoken after one or more preceding words, said method comprising:
- storing an acoustic model of a given word;
  
  making a word match score corresponding to the probability that said given word corresponds to acoustic data generated by said utterance, including making successive partial comparisons between said acoustic model and said acoustic data and successively updating said word match score after each such partial comparison;
  
  storing a language context derived from one or more words spoken prior to said given utterance;
  
  generating a language score for said given word corresponding to a language model probability estimate of the probability that said given word would occur given said stored language;
  
  separately altering said word match score after each of a plurality of said partial comparisons by an amount corresponding to said language score; and
  
  stopping further partial comparisons between said acoustic model of a given word and said acoustic data when the word match score for that given word is worse than a given threshold.
- View Dependent Claims (5, 6)
- - 5. A speech recognition method as described in claim 4, wherein said altering of said word match score includes altering said match score after each of a given number of said partial comparisons and causing the total amount by which said word match score is altered over all of said given number of partial comparisons to substantially equal the value of said language score.
  - 6. A speech recognition method as described in claim 4, wherein said word match score is a logarithmic function of an estimate of the probability that said given word corresponds to the acoustic data generated by said utterance and said language score is a logarithmic function of said language model probability estimate, and said altering of said word match score after each of a plurality of partial comparisons includes adding a fraction of said language score to said word match score after each of said plurality of partial comparisons.

7. A speech recognition system for recognizing a word corresponding to a given utterance spoken after one or more preceding words, said system comprising:
- means for storing an acoustic model of each of a plurality of words belonging to an originally active vocabulary;
  
  currently active vocabulary means for indicating which words in said originally active vocabulary belong to a currently active vocabulary;
  
  means for causing said currently active vocabulary means to originally indicate that all the words in said originally active vocabulary are in said currently active vocabulary;
  
  means for making a word match score for each word in said originally active vocabulary corresponding to the probability that each such word corresponds to acoustic data generated by said utterance, including means for making a succession of estimation cycles, in each of which, for each word in said currently active vocabulary, a partial comparison is made between the acoustic model of said word and said acoustic data and in each of which said word match score for said word is updated in response to said partial comparison;
  
  context storing means for storing a language context derived from one or more words spoken prior to said given utterance;
  
  language score introducing means for generating language scores for individual words for which said partial comparisons are made and for altering, during each of a plurality of said estimation cycles, the word match score of each such word by an amount corresponding to the language score of each such word, each of said language scores corresponding to a language model probability estimate of the probability that a given word would occur in the language context stored in said context storing means; and
  
  means for causing said currently active vocabulary means to indicate that a given word is no longer in said currently active vocabulary when the word match score for said word is worse than a given threshold.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
- - 8. A speech recognition system as described in claim 7, wherein said language score introducing means'"'"' means for altering word match scores includes means for altering said match score for a given word during each of a given number of estimation cycles and for causing the amount by which said language score introducing means alters said word match score over all of said given number of estimation cycles to substantially equal the value of said language score.
  - 9. A speech recognition system as described in claim 7, wherein said word match score for a given word is a logarithmic function of an estimate of the probability that said given word corresponds to the acoustic data generated by said utterance and said language score for said given word is a logarithmic function of said language model probability estimate for said given word, and said means for altering said word match score after each of a plurality of partial comparisons includes means for adding a fraction of said language score to said word match score after each of said plurality of partial comparisons.
  - 10. A speech recognition system as described in claim 9, wherein said language score introducing means'"'"' means for altering word match scores includes means for altering said word match score for a given word for each of a given number of estimation cycles and for causing the amount by which said language score introducing means alters said word match score during each of said given number of estimation cycles to correspond to the value of said language score for said given word divided by said given number.
  - 11. A speech recognition system as described in claim 7, wherein said language score introducing means'"'"' means for altering said word match score for a given word during a given estimation cycle includes means for altering said match score by an amount which is normalized for each given language context stored in said context storing means so that the language scores corresponding to the lowest likelihood language model probability estimates in each language context have approximately the same effect on said word match scores.
  - 12. A speech recognition system as described in claim 7,further including means for receiving said acoustic data generated by said utterance as a sequence of frames, each of which includes a vector of parameters indicating acoustic properties of the utterance during a given period of time;
    - andwherein said means for making a succession of estimation cycles includes means for making a separate estimation cycle for each of a succession of said frames.
  - 13. A speech recognition system as described in claim 12, wherein:
    - said means for storing an acoustic model of each of a plurality of words includes means for storing for each such acoustic model a sequence of nodes, each of which has a vector of parameters corresponding to that of said frames;
      
      said means for making a word match score includes dynamic programming means for seeking to optimally match a succession of said frames against the sequence of nodes in the acoustic model of each word in said originally active vocabulary, and said means for making a succession of estimation cycles includes means for making, for each of a plurality of nodes in the acoustic model of each word in said currently active vocabulary, during each estimation cycle, a comparison between said node and a given frame, means for using said comparison to update a node match score associated with each such node, each such node match score being a function of the probability of an optimal dynamic programming path to said node through the previous nodes of its corresponding acoustic model, and means for selecting the most probable node match score for a given word as its word match score, andsaid language score introducing means including means for storing, for each of said plurality of nodes associated with each word in said currently active vocabulary, a word start value which indicates at what frame, according to the optimal path associated with said node, the utterance of said word began, means for altering, during each of a plurality of estimation cycles, the node match score of each of said plurality of nodes by an amount corresponding to the language score of each such node'"'"'s associated word, and means for using said word start value to determine during which plurality of estimation cycles to alter the node match score associated with said node.
  - 14. A speech recognition system as described in claim 8, wherein said means for causing said currently active vocabulary means to indicate that a given word is no longer in said currently active vocabulary when the word match score for said word is worse than a given threshold includes means for varying the value of said threshold relative to said word match score as a function of changes in the word match score generated for one or more other words in said currently active vocabulary.
  - 15. A speech recognition system as described in claim 14, wherein said means for varying said threshold includes means for setting the value of said threshold relative to said word match scores as a function of the value of the word match score indicating the greatest likelihood of occurrence of any word in the currently active vocabulary.

16. A speech recognition method for recognizing a word corresponding to a given utterance spoken after one or more preceding words, said method comprising:
- storing an acoustic model of each of a plurality of words belonging to an originally active vocabulary;
  
  indicating which words in said originally active vocabulary belong to a currently active vocabulary;
  
  originally indicating that all the words in said originally active vocabulary are in said currently active vocabulary;
  
  making a word match score for each word in said originally active vocabulary corresponding to the probability that each such word corresponds to acoustic data generated by said utterance, including making a succession of estimation cycles, in each of which, for each word in said currently active vocabulary, a partial comparison is made between the acoustic model of said word and said acoustic data and in each of which said word match score for said word is updated in response to said partial comparison;
  
  storing a language context derived from one or more words spoken prior to said given utterance;
  
  generating language scores for individual words for which said partial comparisons are made;
  
  altering, during each of a plurality of said estimation cycles, the word match score of each such word by an amount corresponding to the language score of each such word, each of said language scores corresponding to a language model probability estimate of the probability that a given word would occur in said stored language context; and
  
  indicating that a given word is no longer in said currently active vocabulary when the word match score for said word is worse than a given threshold.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
- - 17. A speech recognition method as described in claim 16, wherein said altering of word match scores includes altering said match score for a given word during each of a given number of estimation cycles and for causing the amount by which said language score alters said word match score over all of said given number of estimation cycles to substantially equal the value of said language score.
  - 18. A speech recognition method as described in claim 16, wherein said word match score for a given word is a logarithmic function of an estimate of the probability that said given word corresponds to the acoustic data generated by said utterance and said language score for said given word is a logarithmic function of said language model probability estimate for said given word, and said altering of said word match score after each of a plurality of partial comparisons includes adding a fraction of said language score to said word match score after each of said plurality of partial comparisons.
  - 19. A speech recognition method as described in claim 18, wherein said altering of word match scores includes altering said word match score for a given word for each of a given number of estimation cycles and for causing the amount by which said language score alters said word match score during each of said given number of estimation cycles to correspond to the value of said language score for said given word divided by said given number.
  - 20. A speech recognition method as described in claim 16, wherein said altering of said word match score for a given word during a given estimation cycle include altering said match score by an amount which is normalized for each given stored language context so that the language scores corresponding to the lowest likelihood language model probability estimates in each language context have approximately the same effect on said word match scores.
  - 21. A speech recognition method as described in claim 16:
    - further including receiving said acoustic data generated by said utterance as a sequence of frames, each of which includes a vector of parameters indicating acoustic properties of the utterance during a given period of time; and
      
      wherein said making of a succession of estimation cycles includes making a separate estimation cycle for each of a succession of said frames.
  - 22. A speech recognition method as described in claim 21, wherein:
    - said storing of an acoustic model of each of a plurality of words includes storing for each such acoustic model a sequence of nodes, each of which has a vector of parameters corresponding to that of said frames;
      
      said making of a word match score includes seeking through the use of dynamic programming to optimally match a succession of said frames against the sequence of nodes in the acoustic model of each word in said originally active vocabulary, and said making of a succession of estimation cycles includes making, for each of a plurality of nodes in the acoustic model of each word in said currently active vocabulary, during each estimation cycle, a comparison between said node and a given frame, using said comparison to update a node match score associated with each such node, each such node match score being a function of the probability of an optimal dynamic programming path to said node through the previous nodes of its corresponding acoustic model, and selecting the most probable node match score for a given word as its word match score, andsaid altering of the word match score of each word includes storing, for each of said plurality of nodes associated with each word in said currently active vocabulary, a word start value which indicates at what frame, according to said optimal path to said node, the utterance of said word began;
      
      altering, during each of a plurality of estimation cycles, the node match score of each of said plurality of nodes by an amount corresponding to the language score of each such node'"'"'s associated word; and
      
      using said word start value to determine during which plurality of estimation cycles to alter the node match score associated with said node.
  - 23. A speech recognition method as described in claim 16, wherein said indicating that a given word is no longer in said current active vocabulary when the word match score for said word is worse than a given threshold includes varying the value of said threshold relative to said word match score as a function of changes in the word match score generated for one or more other words in said currently active vocabulary.
  - 24. A speech recognition method as described in claim 23, wherein said varying of said threshold includes setting the value of said threshold relative to said word match scores as a function of the value of the word match score indicating the greatest likelihood of occurrence of any word in the currently active vocabulary.

25. A pattern recognition system for recognizing an individual pattern in a collection of data given a context of one or more other patterns which have been previously identified, said system comprising:
- means for storing a model of a given pattern;
  
  means for making a pattern match score corresponding to the probability that said given pattern corresponds to said collection of data, including means for making successive partial comparisons between said pattern'"'"'s model and said data and for successively updating said pattern match score after each such partial comparison;
  
  context storing means for storing a context derived from one or more patterns which have been previously identified;
  
  context score introducing means, including means for generating a context score for said given pattern corresponding to a context probability estimate of the probability that said given pattern would occur given the context stored in said context storing means and means for separately altering said pattern match score after each of a plurality of said partial comparison by an amount corresponding to said context score; and
  
  means for stopping said means for making a pattern match score from making further partial comparisons between said model and said data when the pattern match score for that given pattern is worse than a given threshold.
- View Dependent Claims (26, 27)
- - 26. A pattern recognition system as described in claim 25, wherein said context score introducing means'"'"' means for altering said pattern match score includes means for altering said match score after each of a given number of partial comparisons and for causing the total amount by which said context score introducing means alters said pattern match score over all of said given number of partial comparisons to substantially equal the value of said context score.
  - 27. A pattern recognition system as described in claim 25, wherein said pattern match score is a logarithmic function of an estimate of the probability that said given pattern corresponds to said collection of data and said context score is a logarithmic function of said context probability estimate, and said means for altering said pattern match score after each of a plurality of partial comparisons includes means for adding a fraction of said context score to said pattern match score after each of said plurality of partial comparisons.

28. A pattern recognition method for recognizing an individual pattern in a collection of data given a context of one or more other patterns which have been previously identified, said method comprising:
- storing a model of a given pattern;
  
  making a pattern match score corresponding to the probability that said given pattern corresponds to said collection of data, including making successive partial comparisons between said pattern'"'"'s model and said data and for successively updating said pattern match score after each such partial comparison;
  
  storing a context derived from one or more patterns which have been previously identified;
  
  generating a context score for said given pattern corresponding to a context probability estimate of the probability that said given pattern would occur given said stored context;
  
  separately altering said pattern match score after each of a plurality of said partial comparison by an amount corresponding to said context score; and
  
  stopping further partial comparisons between said model and said data when the pattern match score for that given pattern is worse than a given threshold.
- View Dependent Claims (29, 30)
- - 29. A pattern recognition method as described in claim 28, wherein said altering of said pattern match score includes altering said match score after each of a given number of partial comparisons and causing the total amount by which said pattern match score is altered over all of said given number of partial comparisons to substantially equal the value of said context score.
  - 30. A pattern recognition method as described in claim 28, wherein said pattern match score is a logarithmic function of an estimate of the probability that said given pattern corresponds to said collection of data and said context score is a logarithmic function of said context probability estimate, and said altering of said pattern match score after each of a plurality of partial comparisons includes adding a fraction of said context score to said pattern match score after each of said plurality of partial comparisons.

31. A pattern recognition system for recognizing an individual pattern in a collection of data given a context of one or more other patterns which have been previously identified, said system comprising:
- means for storing a model of each of a plurality of patterns belonging to an originally active set of patterns;
  
  currently active set means for indicating which patterns in said originally active set belong to a currently active set of patterns;
  
  means for causing said currently active set means to originally indicate that all the patterns in said originally active set are in said currently active set;
  
  means for making a pattern match score corresponding to the probability that each pattern in said originally active set occurs in said collection of data, including means for making a succession of estimation cycles, in each of which, for each pattern in said currently active set, a partial comparison is made between the model of said pattern and said collection of data and in each of which said pattern match score for said pattern is updated in response to said partial comparison;
  
  context storing means for storing a context derived from one or more patterns previously identified;
  
  context score introducing means for generating context scores for patterns for which said partial comparisons are made and for altering, during each of a plurality of said estimation cycles, the pattern match score of each such pattern by an amount which corresponds to the context score of each such pattern, each of said context scores corresponding to a context probability estimate of the probability that its associated pattern would occur in the context stored in said context storing means; and
  
  means for causing said currently active set means to indicate that a given pattern is no longer in said currently active set when the pattern match score for said pattern is worse than a given threshold.
- View Dependent Claims (32, 33, 34, 35, 36)
- - 32. A pattern recognition system as described in claim 31, wherein said context score introducing means'"'"' means for altering pattern match scores includes means for altering said match score for a given pattern during each of a given number of estimation cycles and for causing the amount by which said context score introducing means alters said pattern match score over all of said given number of estimation cycles to substantially equal the value of said language score.
  - 33. A pattern recognition system as described in claim 31, wherein said match score for a given pattern is a logarithmic function of an estimate of the probability that said given pattern occurs in said collection of data and said context score for said given pattern is a logarithmic function of said context probability estimate for said given pattern, and said means for altering said pattern match score after each of a plurality of partial comparisons includes means for adding a fraction of said context score to said pattern match score after each of said plurality of partial comparisons.
  - 34. A pattern recognition system as described in claim 33, wherein said context score introducing means'"'"' means for altering pattern match scores includes means for altering said match score for a given pattern for each of a given number of estimation cycles and for causing the amount by which said context score introducing means alters said pattern match score during each of said given number of estimation cycles to correspond to the value of said context score for said given word divided by said given number.
  - 35. A pattern recognition system as described in claim 31, wherein said means for causing said currently active set means to indicate that a given pattern is no longer in said currently active set when the pattern match score for said pattern becomes worse than a given threshold includes means for varying the value of said threshold relative to said pattern match scores as a function of changes in the match score generated for one or more other patterns in said currently active set.
  - 36. A pattern recognition system as described in claim 35, wherein said means for varying said threshold includes means for setting the value of said threshold relative to said pattern match score as a function of the value of the pattern match score indicating the greatest likelihood of occurrence of any pattern in the currently active set.

37. A pattern recognition method for recognizing an individual pattern in a collection of data given a context of one or more other patterns which have been previously identified, said method comprising:
- storing a model of each of a plurality of patterns belonging to an originally active set of patterns;
  
  indicating which patterns in said originally active set belong to a currently active set of patterns;
  
  causing said currently active set means to originally indicate that all the patterns in said originally active set are in said currently active set;
  
  making a pattern match score corresponding to the probability that each pattern in said originally active set occurs in said collection of data, including making a succession of estimation cycles, in each of which, for each pattern in said currently active set, a partial comparison is made between the model of said pattern and said collection of data and in each of which said pattern match score for said pattern is updated in response to said partial comparison;
  
  storing a context derived from one or more patterns previously identified;
  
  generating context scores for patterns for which said partial comparisons are made;
  
  altering, during each of a plurality of said estimation cycles, the pattern match score of each such pattern by an amount which corresponds to the context score of each such pattern, each of said context scores corresponding to a context probability estimate of the probability that its associated pattern would occur in said stored context; and
  
  indicating that a given pattern is no longer in said currently active set when the pattern match score for said pattern is worse than a given threshold.
- View Dependent Claims (38, 39, 40, 41, 42)
- - 38. A pattern recognition method as described in claim 37, wherein said altering of pattern match scores includes altering said match score for a given pattern during each of a given number of estimation cycles and for causing the amount by which said context score alters said pattern match score over all of said given number of estimation cycles to substantially equal the value of said language score.
  - 39. A pattern recognition method as described in claim 37, wherein said match score for a given pattern is a logarithmic function of an estimate of the probability that said given pattern occurs in said collection of dta and said context score for said given pattern is a logarithmic function of said context probability estimate for said given pattern, and said altering said pattern match score after each of a plurality of partial comparisons includes adding a fraction of said context score to said pattern match score after each of said plurality of partial comparisons.
  - 40. A pattern recognition method as described in claim 39, wherein said altering of pattern match scores includes altering said match score for a given pattern for each of a given number of estimation cycles and for causing the amount by which said context score alters said pattern match score during each of said given number of estimation cycles to correspond to the value of said context score for said given word divided by said given number.
  - 41. A pattern recognition method ass described in claim 37, wherein said indicating that a given pattern is no longer in said currently active set when the pattern match score for said pattern becomes worse than a given threshold includes varying the value of said threshold relative to said pattern match scores as a function of changes in the match score generated for one or more other patterns in said currently active set.
  - 42. A pattern recognition method as described in claim 41, wherein said varying of said threshold includes setting the value of said threshold relative to said pattern match score as a function of the value of the pattern match score indicating the greatest likelihood of occurrence of any pattern in the currently active set.

43. A speech recognition system for recognizing a word corresponding to a given spoken utterance, said system comprising:
- means for making, for each word of an initial vocabulary, a relatively rapid scoring, which produces a rapid match score corresponding to the probability that said word corresponds to a sequence of acoustic data generated by said utterance;
  
  means for placing a word from said initial vocabulary into an originally active vocabulary when said rapid match score for said word is better than a given threshold; and
  
  means for making, for each word of said originally active vocabulary, a more lengthy scoring, which produces a more accurate match score corresponding to the probability that such word corresponds to said sequence of acoustic data.
- View Dependent Claims (44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55)
- - 44. A speech recognition system as described in claim 43, wherein both said means for making a rapid scoring and said means for making a lengthy scoring include means for calculating an acoustic likelihood score indicating the closeness of a match between an acoustic model of a given word and said sequence of acoustic data.
  - 45. A speech recognition system as described in claim 44, wherein both said means for making a rapid scoring and said means for making a lengthy scoring include means for calculating their scores for a given word based in part on a language score which corresponds to a language model probability estimate of the probability that said given word would occur given a language context derived from one or more words which have preceded said given word.
  - 46. A speech recognition system as described in claim 44, fuerther including means for storing said sequence of acoustic data as a sequence of frames, each of which includes a vector of parameters indicating acoustic properties of the utterance during a given period of time.
  - 47. A speech recognition system as described in claim 46, wherein said means for making a rapid scoring includes means for making said scoring based on a number of said recorded frames which is less than the number of said frames which occur normally during the utterance of a majority of the words in said initial vocabulary.
  - 48. A speech recognition system as described in claim 46, wherein said means for making a rapid scoring includes means for averaging corresponding acoustic parameters in a plurality of said frames and means for comparing the closeness of the resulting set of averaged parameters with a corresponding set of averaged parameters associated with each word in said initial vocabulary.
  - 49. A speech recognition system as described in claim 48, wherein said means for making a rapid scoring includes means for causing said averaging means to separately perform its averaging for each of a plurality of sequences of frames and for causing said means for comparing to compare the closeness of each of the resulting sets of averaged parameters with each of a corresponding set of parameters associated with each word in said initial vocabulary.
  - 50. A speech recognition systems as described in claim 49, wherein said means for causing said averaging means to perform its averaging separately for each of a plurality of sequences of frames including means for causing at least two of said sequences to include one or more of the same frames.
  - 51. A speech recognition system as described in claim 43, wherein said means for making a rapid scoring includes means for making said scoring for each word in said initial vocabulary, one word at a time and wherein said means for making a lengthy scoring includes the following:
    - means for storing an acoustic model of each of a plurality of words belonging to said originally active vocabulary;
      
      currently active vocabulary means for indicating which words in said originally active vocabulary belong to a currently active vocabulary;
      
      means for causing said currently active vocabulary means to originally indicate that all the words in said originally active vocabulary are in said currently active vocabulary;
      
      means for making a word match score corresponding to the probability that each word in said originally active vocabulary corresponds to acoustic data generated by said utterance, including means for making a succession of estimation cycles, in each of which, for each word in said currently active vocabulary, a partial comparison is made between the acoustic model of said word and said acoustic data and in which said match score for said word is updated in response to said partial comparison; and
      
      means for causing said currently active vocabulary means to indicate that a given word is no longer in said currently active vocabulary when the word match score for said word is worse than a given threshold.
  - 52. A speech recognition system as described in claim 51, further including means for storing said sequence of acoustic data as a sequence of frames, each of which includes a vector of parameters indicating acoustic properties of the utterance during a given period of time;
    - andwherein said means for making a succession of estimation cycles includes means for making one such cycle for each of a succession of said frames and for making a comparison of the acoustic model of each of a plurality of words with said frame during said cycle.
  - 53. A speech recognition system as described in claim 52, further including:
    - means for producing a start score corresponding to an estimated probability that the utterance of a word has not begun at a given point in said sequence of recorded frames; and
      
      means for resetting said currently active vocabulary to include all the words contained in said originally active vocabulary when, after one or more estimation cycles, said means for producing a start score produces a start score which is better than a threshold.
  - 54. A speech recognition system as described in claim 53, wherein:
    - said currently active vocabulary means includes means for keeping all the words in the currently active vocabulary as a linked list;
      
      said means for causing said currently active vocabulary means to indicate that a given word is no longer in said currently active vocabulary includes means for causing said word which is no longer to be in said currently active vocabulary to be un-linked from said linked list; and
      
      said means for resetting said currently active vocabulary includes means for causing all the words in said originally active vocabulary to be re-linked into said linked list.
  - 55. A speech recognition system as described in claim 54, further including means for representing said initial vocabulary as a list of word records, each of which has a means for storing a pointer to another word record in said list;
    - andsaid means for causing said currently active vocabulary means to originally indicate that all the words in said originally active vocabulary are in said currently active vocabulary includes means for pointing to a word record which represents a first word in said originally active vocabulary and for causing the means for storing a pointer in each word record associated with a word in said originally active vocabulary to point to the word record associated with the next word in said originally active vocabulary when there is such a next word.

56. A speech recognition method for recognizing a word corresponding to a given spoken utterance, said method comprising:
- making, for each word of an initial vocabulary, a relatively rapid scoring, which produces a rapid match score corresponding to the probability that said word corresponds to a sequence of acoustic data generated by said utterance;
  
  placing a word from said initial vocabulary into an originally active vocabulary when said rapid match score for said word is better than a given threshold; and
  
  making, for each word of said originally active vocabulary, a more lengthy scoring, which produces a more accurate match score corresponding to the probability that such word corresponds to said sequence of acoustic data.
- View Dependent Claims (57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68)
- - 57. A speech recognition method as described in claim 56, wherein both said making of a rapid scoring and said making of a lengthy scoring include calculating an acoustic likelihood score indicating the closeness of a match between an acoustic model of a given word and said sequence of acoustic data.
  - 58. A speech recognition method as described in claim 57, wherein both said making of a rapid scoring and said making of a lengthy scoring include calculating their scores for a given word based in part on a language score which corresponds to a language model probability estimate of the probability that said given word would occur given a language context derived from one or more words which have preceded said given word.
  - 59. A speech recognition method as described in claim 57, further including storing said sequence of acoustic data as a sequence of frames, each of which includes a vector of parameters indicating acoustic properties of the utterance during a given period of time.
  - 60. A speech recognition method as described in claim 59, wherein said making of a rapid scoring includes making said scoring based on a number of said recorded frames which is less than the number of said frames which occur normally during the utterance of a majority of the words in said initial vocabulary.
  - 61. A speech recognition method as described in claim 59, wherein said making of a rapid scoring includes averaging corresponding acoustic parameters in a plurality of said frames and comparing the closeness of the resulting set of averaged parameters with a corresponding set of averaged parameters associated with each word in said initial vocabulary.
  - 62. A speech recognition method as described in claim 61, wherein said making of a rapid scoring includes causing said averaging means to separately perform its averaging for each of a plurality of sequences of frames and for causing said comparing to compare the closeness of each of the resulting sets of averaged parameters with each of a corresponding set of parameters associated with each word in said initial vocabulary.
  - 63. A speech recognition method as described in claim 62, wherein said causing of said averaging means to perform its averaging separately for each of a plurality of sequences of frames including causing at least two of said sequences to include one or more of the same frames.
  - 64. A speech recognition method as described in claim 56, wherein said making of a rapid scoring includes making said scoring for each word in said initial vocabulary, one word at a time and wherein said making of a lengthy scoring includes the following:
    - storing an acoustic model of each of a plurality of words belonging to said originally active vocabulary;
      
      indicating which words in said originally active vocabulary belong to a currently active vocabulary;
      
      originally indicating that all the words in said originally active vocabulary are in said currently active vocabulary;
      
      making a word match score corresponding to the probability that each word in said originally active vocabulary corresponds to acoustic data generated by said utterance, including making a succession of estimation cycles, in each of which, for each word in said currently active vocabulary, a partial comparison is made between the acoustic model of said word and said acoustic data and in which said match score for said word is updated in response to said partial comparison; and
      
      indicating that a given word is no longer in said currently active vocabulary when the word match score for said word is worse than a given threshold.
  - 65. A speech recognition method as described in claim 64, further including storing said sequence of acoustic data as a sequence of frames, each of which includes a vector of parameters indicating acoustic properties of the utterance during a given period of time;
    - andwherein said making of a succession of estimation cycles includes making one such cycle for each of a succession of said frames and for making a comparson of the acoustic model of each of a plurality of words with said frame during said cycle.
  - 66. A speech recognition method as described in claim 65, further including:
    - producing a start score corresponding to an estimated probability that the utterance of a word has not begun at a given point in said sequence of recorded frames; and
      
      resetting said currently active vocabulary to include all the words contained in said originally active vocabulary when, after one or more estimation cycles, said producing of a start score produces a start score which is better than a threshold.
  - 67. A speech recognition method as described in claim 66, wherein:
    - said indicating of which words in said originally active vocabulary belong to a currently active vocabulary includes keeping all the words in the currently active vocabulary as a linked list;
      
      said indicating that a given word is no longer in said currently active vocabulary includes causing said word which is no longer to be in said currently active vocabulary to be unlinked from said linked list; and
      
      said resetting of said currently active vocabulary includes causing all the words in said originally active vocabulary to be re-linked into said linked list.
  - 68. A speech recognition method as described in claim 67, further including representing said initial vocabulary as a list of word records, each of which has a pointer to another word record in said list;
    - andsaid originally indicating that all the words in said originally active vocabulary are in said currently active vocabulary includes pointing to a word record which represents a first word in said originally active vocabulary and causing said pointer in each word record associated with a word in said originally active vocabulary to point to the word record associated with the next word in said originally active vocabulary when there is such a next word.

69. A speech recognition system for recognizing a word in a sequence of acoustic data, said system comprising:
- means for storing an acoustic model of expected background noise;
  
  means for comparing said acoustic model with successive portions of said sequence of acoustic data and producing a silence score corresponding to the probability that such successive portions of data correspond to background noise and thus that said utterance has not yet begun;
  
  means for determining from said acoustic data the acoustic amplitude corresponding to successive portions of said data;
  
  means for producing an accumulated sound score which corresponds a sum of to the amount of said acoustic amplitude over said successive portions of data, said accumulated sound score inversely corresponding to the probability that an utterance has not yet begun; and
  
  means, responsive to both said silence score and said accumulated sound score for determining during which of said successive portions of data to look for the start of said utterance.
- View Dependent Claims (70, 71)
- - 70. A speech recognition system as described in claim 69, wherein said means for determining during which successive portions of said data to look for the start of said utterance further includes:
    - means for selecting during each successive portion of data the score from amoung said silence score and said accumulated sound score which indicates with the greatest probability that said utterance has not yet begun; and
      
      means for comparing said selected score with a threshold during each successive portion of data, and for causing said speech recognition system to look for the start of said utterance in the vicinity of said portion of data when said selected score for a given portion of said sequence is better than said threshold.
  - 71. A speech recognition system as described in claim 69, wherein said means for producing an accumulated sound score includes means for accumulating the amount by which said acoustic amplitude exceeds an amplitude threshold during each of said successive portions of data.

72. A speech recognition method for recognizing a word in a sequence of acoustic data, said method comprising:
- storing an acoustic model of expected background noise;
  
  comprising said acoustic model with successive portions of said sequence of acoustic data and producing a silence score corresponding to the probability that such successive portions of data correspond to background noise and thus that said utterance has not yet begun;
  
  determining from said acoustic data the acoustic amplitude corresponding to successive portions of said data;
  
  producing an accumulated sound score which corresponds to a sum of the amount of said acoustic amplitude over said successive portions of data, said accumulated sound score inversely corresponding to the probability that an utterance has not yet begun; and
  
  determining, in response to both said silence score and said accumulated sound score, during which of said successive portions of data to look for the start of said utterance.
- View Dependent Claims (73, 74)
- - 73. A speech recognition method as described in claim 72, wherein said determining during which successive portions of said data to look for the start of said utterance further includes:
    - selecting during each successive portion of data the score from amoung said silence score and said accumulated sound score which indicates with the greatest probability that said utterance has not yet begun; and
      
      comparing said selected score with a threshold during each successive portion of data, and for causing said speech recognition method to look for the start of said utterance in the vicinity of said portion of data when said selected score for a given portion of said sequence is better than said threshold.
  - 74. A speech recognition method as described in claim 72, wherein said producing of an accumulated sound score includes accumulating the amount by which said acoustic amplitude exceeds an amplitude threshold during each of said successive portions of data.

75. A speech recognition system for recogizing a word in a sequence of acoustic data frames, said system comprising:
- means for storing an initial vocabulary of words;
  
  means for performing initial computations on said sequence of acoustic data frames and for producing, in response to said computations, an originally active vocabulary, containing those words from said initial vocabulary which are determined by said initial computations to be the most probable candidates for corresponding to said acoustic data frames;
  
  means for remembering the words contained in said originally active vocabulary;
  
  currently active vocabulary means for indicating which words in said originally active vocabulary belong to a currently active vocabulary;
  
  means for causing said currently active vocabulary means to initially indicate that all the words in said originally active vocabulary are in said currently active vocabulary;
  
  means for making a word match score corresponding to the probability that each word in said currently active vocabulary corresponds to said sequence of acoustic data frames, including means for making an estimation cycle for each successive data frame, in which for each word in said currently active vocabulary, a comparison is made between the acoustic model of said word and said frame and said word match score for said word is updated in response to said comparison;
  
  means for causing said currently active vocabulary means to indicate that a given word is no longer in said currently active vocabulary when the word match score for said word becomes worse than a given threshold;
  
  means for producing a start score corresponding to the probability that an acoustic data frame precedes the beginning of the utterance of the word to be recognized; and
  
  means for resetting said currently active vocabulary to include all the words remembered by said means for remembering as belonging in said original vocabulary and for performing said resetting when said start score is better than a given threshold.
- View Dependent Claims (76, 77)
- - 76. A speech recognition system as described in claim 75, wherein:
    - said currently active vocabulary means includes means for keeping all the words in the currently active vocabulary as a linked list;
      
      said means for causing said currently active vocabulary means to indicate that a given word is no longer in said currently active vocabulary includes means for causing said given word to be unlinked from said linked list; and
      
      said means for resetting said currently active vocabulary includes means for causing all the words remembered by said means for remembering as being in said originally active vocabulary to be relinked into said linked list.
  - 77. A speech recognition system as described in claim 76, further including means for representing said initial vocabulary as a list of word records, each of which has a means for storing a pointer to another word record in said list;
    - andsaid means for causing said currently active vocabulary means to originally indicate that all the words in said originally active vocabulary are in said currently active vocabulary includes means for causing said currently active vocabulary means to point to said word record representing a first word in said originally active vocabulary and to cause the pointer in the word record associated with each word in a said originally active vocabulary to point to the word record of the next word in said originally active vocabulary.

78. A speech recognition method for recognizing a word in a sequence of acoustic data frames, said method comprising:
- storing an initial vocabulary of words;
  
  performing initial computations on said sequence of acoustic data frames and for producing, in response to said computations, an originally active vocabulary, containing those words from said initial vocabulary which are determined by said initial computations to be the most probable candidates for corresponding to said acoustic data frames;
  
  remembering the words contained in said originally active vocabulary;
  
  indicating which words in said originally active vocabulary belong to a currently active vocabulary;
  
  initially indicating that all the words in said originally active vocabulary are in said currently active vocabulary;
  
  making a word match score corresponding to the probability that each word in said currently active vocabulary corresponds to said sequence of acoustic data frames, including making an estimation cycle for each successive data frame, in which for each word in said currently active vocabulary, a comparison is made between the acoustic model of said word and said frame and said word match score for said word is updated in response to said comparison;
  
  indicating that a given word is no longer in said currently active vocabulary when the word match score for said word becomes worse than a given threshold;
  
  producing a start score corresponding to the probability that an acoustic data frame precedes the beginning of the utterance of the word to be recognized; and
  
  resetting said currently active vocabulary to include all the words remembered by said remembering as belonging in said original vocabulary and for performing said resetting when said start score is better than a given threshold.
- View Dependent Claims (79, 80)
- - 79. A speech recognition method as described in claim 78, wherein:
    - said indicating which words belong to a currently active vocabulary includes keeping all the words in the currently active vocabulary as a linked list;
      
      said indicating that a given word is no longer in said currently active vocabulary includes causing said given word to be unlinked from said linked list; and
      
      said resetting of said currently active vocabulary includes causing all the words remembered by said remembering as being in said originally active vocabulary to be relinked into said linked list.
  - 80. A speech recognition method as described in claim 79, further including representing said initial vocabulary as a list of word records, each of which has a pointer capable of pointing to another word record in said list;
    - andsaid originally indicating that all the words in said originally active vocabulary are in said currently active vocabulary includes causing a pointer to point to said word record representing a first word in said originally active vocabulary and to cause the pointers in the word record associated with each word in a said originally active vocabulary to point to the word record of the next word in said originally active vocabulary when there is such a next word.

81. A likelihood processor for use in a speech recognition system to compute a score corresponding to the probabilistic match between a vector of parameters associated with a frame of acoustic data and a node model having for each parameter in said vector both an expected value and a weighting value corresponding to the expected deviation from said expected value, said processor comprising:
- parameter supplying means for supplying a digital value corresponding to a given parameter of a given frame of data;
  
  expected value supplying means for supplying a digital value corresponding to the expected value for said given parameter according to a given node model;
  
  first hardware means, receiving inputs from both said parameter supplying means and said expected value supplying means, for computing the absolute value of the difference between said two inputs and for producing said absolute value as a digital output;
  
  weighting value supplying means for supplying a digital weighting value corresponding to the expected deviation from said expected value for said given parameter according to said given node model;
  
  memory means, receiving addressing inputs from both the output of said first hardware means and said weighting value supplying means, for storing, at each of a plurality of addresses formed by combining said two addressing inputs, a parameter match score corresponding to a probability that said given parameter value corresponds to said expected value, considering said weighting value, and for producing said score as a digital output;
  
  a hardware adder having two inputs for receiving digital values and an output at which it produces a digital value corresponding to the sum of the values supplied to its inputs;
  
  means for connecting the output of said memory means to one input of said adder;
  
  latch means connected to the output of said adder for latching the value produced at said output; and
  
  means for supplying the value latched by said latch means to one of the inputs of said adder, whereby said hardware adder can produce a running total of said parameter match scores.
- View Dependent Claims (82, 83, 84)
- - 82. A likelihood processor as described in claim 81, wherein said first hardware means includes means for limiting the numerical value of its output to a limited range which is less than the range of absolute values made possible by the range of values of its inputs, whereby the memory space required in said memory means to correspond to the address space formed by combining the output of said hardware means and said weighting value is reduced.
  - 83. A likelihood processor as described in claim 82, wherein said means for limiting the numerical value of the output of said first hardware means includes means for causing the numerical value of said output to be equal to a value at the high end of said limited range when the actual absolute value of the difference between the inputs of said hardware means exceeds said limited range.
  - 84. A likelihood processor as described in claim 81, wherein said memory means stores, at each of said addresses, a parameter match score which corresponds to the sum of two portions, a first portion which corresponds to the product of the output of said first hardware means and said digital weighting value and a second portion which corresponds to the logarithm of said weighting value.

85. A likelihood processor for use in a speech recognition system to compute a score corresponding to the probabilistic match between a vector of parameters associated with a frame of acoustic data and a node model having for each parameter in said vector both an expected value and a weighting value corresponding to the expected deviation from said expected value, said processor comprising:
- parameter supplying means for supplying a digital value corresponding to a given parameter of a given frame of data;
  
  expected value supplying means for supplying a digital value corresponding to the expected value for said given parameter according to a given node model;
  
  subtracting means, receiving inputs from both said parameter supplying means and said expected value supplying means, for computing the absolute value of the difference between said two inputs and for producing said absolute value as a digital output;
  
  weighting value supplying means for supplying a digital weighting value corresponding to the expected deviation from said expected value for said given parameter according to said given node model;
  
  memory means, receiving addressing inputs from both the output of said subtracting means and said weighting value supplying means, for storing, at each of a plurality of addresses formed by combining said addressing inputs, a parameter match score corresponding to a probability that said given parameter value corresponds to said expected value given said weighting value and for producing said score as a digital output, said parameter match score having a value which corresponds to the sum of two portions, a first portion which corresponds to the product of the output of said subtracting means and said digital weighting value and a second portion which corresponds to the logarithm of said weighting value; and
  
  accumulator means for accumulating said parameter match scores associated with a given frame for a given node.

86. A likelihood processor for use in a speech recognition system to compute a score corresponding to the probabilistic match between a vector of parameters associated with a frame of acoustic data and a node model having for each parameter in said vector both an expected value for said parameter and a weighting value corresponding to the expected deviation from said expected value, said processor comprising:
- parameter supplying means for supplying a digital value corresponding to a given parameter of a given frame of data;
  
  expected value supplying means for supplying a digital value corresponding to the expected value for said given parameter according to a given node model;
  
  subtracting means, receiving inputs from both said parameter supplying means and said expected value supplying means, for computing the absolute value of the difference between said two inputs and for producing said absolute value as a digital output;
  
  weighting value supplying means for supplying a digital weighting value corresponding to the expected deviation from said expected value for said given parameter according to said given node model;
  
  memory means, receiving addressing inputs from both the output of said subtracting means and said weighting value supplying means, for storing, at each of a plurality of addresses formed by combining said addressing inputs, a parameter match score having a value corresponding to a probability that said given parameter value corresponds to said expected value given said weighting value and for producing said match score as a digital output; and
  
  accumulator means for accumulating said parameter match scores associated with a given frame for a given node;
  
  said subtracting means including means for limiting the numerical value of its output to a limited range which is less than the range of the absolute values made possible by the range of values of its inputs, whereby the memory space required in said memory means to correspond to the address space formed by combining the output of said subtacting means and said weighting value is reduced.
- View Dependent Claims (87)
- - 87. A likelihood processor as described in claim 86, wherein said means for limiting the numerical value of the output of said subtracting means includes means for causing the numerical value of said output to be equal to a value at the high end of said limited range when the actual absolute value of the difference between the inputs of said subtracting means exceeds said limited range.

88. A speech recognition system for recognizing a sequence of separately spoken words, said system comprising:
- means for responding to a given utterance by producing a best guess word which is the word considered by the system most likely to correspond to said utterance; and
  
  means for responding to the utterance of another word as confirmation that said best guess word is the word corresponding to said given utterance.
- View Dependent Claims (89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99)
- - 89. A speech recognition system as described in claim 88, wherein said means for responding to the utterance of another word as confirmation includes means for detecting an utterance and means for responding to any detection of another utterance by said means as confirmation that said best guess word is the word corresponding to said given utterance spoken before said another utterance.
  - 90. A speech recognition system as described in claim 88, further including means for displaying said best guess word so a user can see it before uttering another word.
  - 91. A speech recognition system as described in claim 90:
    - wherein said means for responding to an utterance includes means for producing in response to said given utterance one or more close call words which are considered by the system next most likely to correspond to said given utterance after said best guess word;
      
      further including means for displaying said one or more close call words; and
      
      further including means for responding to an input signal associated with a given close call word as confirmation that said given close word is the word corresponding to said given utterance.
  - 92. A speech recognition system as described in claim 91, wherein said means for displaying one or more close call words further includes means for indicating which input signal is associated with said given close call word.
  - 93. A speech recognition system as described in claim 91, wherein said means for responding to an input signal includes means for responding to an actuation of a keyboard key as said input signal.
  - 94. A speech recognition system as described in claim 93, further including means for responding to an input signal associated with the rejection of said best guess word and said one or more close call words as confirmation that neither said best guess word nor any of said close call words was the word corresponding to said given utterance.
  - 95. A speech recognition system as described in claim 94, wherein said means for responding to an input signal associated with the rejection of said best guess word and said one or more close call words includes means for removing the display of said best guess word and said one or more close call words in response to said input signal associated with said rejection.
  - 96. A speech recognition system as described in claim 94, wherein said means for responding to an input signal associated with the rejection of said best guess word and said one or more close call words includes means for responding to an actuation of a keyboard key as said input signal associated with rejection.
  - 97. A speech recognition system as described in claim 91, wherein said means for responding to the utterance of another word as confirmation that said best guess word was the intended word includes means for terminating the displaying of said one or more close call words.
  - 98. A speech recognition system as described in claim 88, further including a language model means for helping it predict the likelihood that a certain utterance corresponds to one or more certain words based at least in part on the previous word recognized by said system, wherein said means for responding to the utterance of another word as confirmation that said best guess word is the word corresponding to said given utterance includes means for indicating to said language model means that said best guess word is the previous word for use of said language model means in helping to predict the likelihood that said another utterance corresponds to one or more certain words.
  - 99. A speech recognition system as described in claim 98:
    - wherein said means for responding to a given utterance includes means for producing in response to said given utterance one or more close call words which are considered by the system next most likely, after said best guess word, to correspond to said given utterance;
      
      further including means for displaying said one or more close call words;
      
      further including means for responding to an input signal associated with a given close call word as confirmation that said given close call word is the word corresponding to said given utterance; and
      
      wherein said means for responding to an input signal associated with a given close call word includes means for indicating to said language model means that said given close call word is the previous word for use of said language model means in helping to predict the likelihood that said another utterance corresponds to one or more certain words.

100. A speech recognition method for recognizing a sequence of spoken words, said method comprising:
- responding to a given utterance by producing a best guess word which is the word considered by the method most likely to correspond to said utterance; and
  
  responding to the utterance of another word as confirmation that said best guess word is the word corresponding to said given utterance.
- View Dependent Claims (101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111)
- - 101. A speech recognition method as described in claim 100, wherein said responding to the utterance of another word as confirmation includes detecting an utterance and responding to any detection of another utterance by said means as confirmation that said best guess word is the word corresponding to said given utterance spoken before said another utterance.
  - 102. A speech recognition method as described in claim 100, further including displaying said best guess word so a user can see it before uttering another word.
  - 103. A speech recognition method as described in claim 102:
    - wherein said responding to an utterance includes producing in response to said given utterance one or more close call words which are considered by the method next most likely to correspond to said given utterance after said best guess word;
      
      further including displaying said one or more close call words; and
      
      further including responding to an input signal associated with a given close call word as confirmation that said given close word is the word corresponding to said given utterance.
  - 104. A speech recognition method as described in claim 103, wherein said displaying one or more close call words further includes indicating which input signal is associated with said given close call word.
  - 105. A speech recognition method as described in claim 103, wherein said responding to an input signal includes responding to an actuation of a keyboard key as said input signal.
  - 106. A speech recognition method as described in claim 105, further including responding to an input signal associated with the rejection of said best guess word and said one or more close call words as confirmation that neither said best guess word nor any of said close call words was the word corresponding to said given utterance.
  - 107. A speech recognition method as described in claim 106, wherein said responding to an input signal associated with the rejection of said best guess word and said one or more close call words includes removing the display of said best guess word and said one or more close call words in response to said input signal associated with said rejection.
  - 108. A speech recognition method as described in claim 106, wherein said responding to an input signal associated with the rejection of said best guess word and said one or more close call words includes responding to an actuation of a keyboard key as said input signal associated with rejection.
  - 109. A speech recognition method as described in claim 103, wherein said responding to the utterance of another word as confirmation that said best guess word was the intended word includes terminating the displaying of said one or more close call words.
  - 110. A speech recognition method as described in claim 100, further including using a language model to help in the prediction of a likelihood that a certain utterance corresponds to one or more certain words based at least in part on the previous word recognized by said method, wherein said responding to the utterance of another word as confirmation that said best guess word is the word corresponding to said given utterance includes indicating that said best guess word is the previous word for said use of said language model in helping to predict the likelihood that said another utterance corresponds to one or more certain words.
  - 111. A speech recognition method as described in claim 110:
    - wherein said responding to a given utterance includes producing in response to said given utterance one or more close call words which are considered by the method next most likely, after said best guess word, to correspond to said given utterance;
      
      further including displaying said one or more close call words;
      
      further including responding to an input signal associated with a given close call word as confirmation that said given close call word is the word corresponding to said given utterance; and
      
      wherein said responding to an input signal associated with a given close call word includes indicating that said given close call word is the previous word for said use of said language model in helping to predict the likelihood that said another utterance corresponds to one or more certain words.

112. A speech recognition system for recognizing a sequence of speech sounds, said system comprising:
- means for responding to said sequence of speech sounds by producing a best guess word, which is the word considered by the system most likely to correspond to said sequence, and one or more close call words, which are considered by the system next most likely to correspond to said given sequence after said best guess word; and
  
  means for displaying said best guess word so a user can see it;
  
  means for displaying said one or more close call words so a user can see it; and
  
  means for responding to an input signal associated with a given close call word as confirmation that said given close call word is the word corresponding to said given sequence of speech sounds, rather than said best guess word.
- View Dependent Claims (113, 114, 115, 116, 117, 118, 119)
- - 113. A speech recognition system as described in claim 112, wherein said means for displaying said best guess word includes means for displaying said word in the context of words spoken before or after said sequence of speech sounds and said means for displaying said one or more close call words further includes means for displaying said one or more close call words in close proximity to said best guess word, so that said close call words are also displayed in the context of words spoken before or after a sequence of speech sounds.
  - 114. A speech recognition system as described in claim 112, wherein said means for displaying one or more close call words further includes means for displaying an indication which input signal is associated with said given close call word.
  - 115. A speech recognition system as described in claim 112, wherein said means for responding to an input signal includes means for responding to an actuation of a keyboard key as said input signal.
  - 116. A speech recognition system as described in claim 112, further including means for responding to an input signal associated with the rejection of both said best guess word and said one or more close call words as confirmation that neither said best guess word nor any of said close call words was the word corresponding to said given sequence of speech sounds.
  - 117. A speech recognition system as described in claim 116, wherein said means for responding to an input signal associated with the rejection of both said best guess word and said one or more close call words includes means for removing the display of said best guess word and said one or more close call words in response to said input signal associated with said rejection.
  - 118. A speech recognition system as described in claim 117, wherein said means for responding to an input signal associated with the rejection of both said best guess word and said one or more close call words includes means for responding to an actuation of a keyboard key as said input signal associated with rejection.
  - 119. A speech recognition system as described in claim 112:
    - further including means for using a language model to help in the prediction of a likelihood that a certain utterance corresponds to one or more certain words based at least in part on the previous word recognized by said system; and
      
      wherein said means for responding to an input signal associated with a given close call word includes means for indicating that said given close call word is the previous word for said use of said language model in helping to predict the likelihood that said another utterance corresponds to one or more certain words.

120. A speech recognition method for recognizing a sequence of speech sounds, said method comprising:
- responding to said sequence of speech sounds by producing a best guess word, which is the word considered by the method most likely to correspond to said sequence, and one or more close call words, which are considered by the method next most likely to correspond to said given sequence after said best guess word; and
  
  displaying said best guess word so a user can see it;
  
  displaying said one or more close call words so a user can see it; and
  
  responding to an input signal associated with a given close call word as confirmation that said given close call word is the word corresponding to said given sequence of speech sounds, rather than said best guess word.
- View Dependent Claims (121, 122, 123, 124, 125, 126, 127)
- - 121. A speech recognition method as described in claim 120, wherein said displaying of said best guess word includes displaying said word in the context of words spoken before or after said sequence of speech sounds and said displaying of said one or more close call words further includes displaying said one or more close call words in close proximity to said best guess word, so that said close call words are also displayed in the context of words spoken before or after a sequence of speech sounds.
  - 122. A speech recognition method as described in claim 120, wherein said displaying of one or more close call words further includes displaying an indication which input signal is associated with said given close call word.
  - 123. A speech recognition method as described in claim 120, wherein said responding to an input signal includes responding to an actuation of a keyboard key as said input signal.
  - 124. A speech recognition method as described in claim 120, further including responding to an input signal associated with the rejection of both said best guess word and said one or more close call words as confirmation that neither said best guess word nor any of said close call words was the word corresponding to said given sequence of speech sounds.
  - 125. A speech recognition method as described in claim 124, wherein said responding to an input signal associated with the rejection of both said best guess word and said one or more close call words includes removing the display of said best guess word and said one or more close call words in response to said input signal associated with said rejection.
  - 126. A speech recognition method as described in claim 125, wherein said responding to an input signal associated with the rejection of both said best guess word and said one or more close call words includes responding to an actuation of a keyboard key as said input signal associated with rejection.
  - 127. A speech recognition method as described in claim 120:
    - further including using a language model to help in the prediction of a likelihood that a certain utterance corresponds to one or more certain words based at least in part on the previous word recognized by said method; and
      
      wherein said responding to an input signal associated with a given close call word includes indicating that said given close call word is the previous word for said use of said language model in helping to predict the likelihood that said another utterance corresponds to one or more certain words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dragon Systems, Inc. (Microsoft Corporation)
Original Assignee
Dragon Systems, Inc. (Microsoft Corporation)
Inventors
Roth, Robert S., Sidell, Mark F., Baker, James K., Bamberg, Paul G.
Primary Examiner(s)
Salce, Patrick R.
Assistant Examiner(s)
Ault, Anita M.

Application Number

US06/797,249
Time in Patent Office

1,092 Days
Field of Search

381/42, 381/43
US Class Current

704/252
CPC Class Codes

G10L 15/00 Speech recognition G10L17/0...

Speech recognition apparatus and method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

127 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition apparatus and method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

127 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links