Speech recognition apparatus and method
First Claim
1. A speech recognition system for recognizing a word corresponding to a given utterance spoken after one or more preceding words, said system comprising:
- means for storing an acoustic model of a given word;
means for making a word match score corresponding to the probability that said given word corresponds to acoustic data generated by said utterance, including means for making successive partial comparisons between said acoustic model and said acoustic data and for successively updating said word match score after each such partial comparison;
context storing means for storing a language context derived from one or more words spoken prior to said given utterance;
language score introducing means, including means for generating a language score for said given word corresponding to a language model probability estimate of the probability that said given word would occur given the language context stored in said context storing means and means for separately altering said word match score after each of a plurality of said partial comparisons by an amount corresponding to said language score; and
means for stopping said means for making a word match score from making further partial comparisons between said acoustic model of a given word and said acoustic data when the word match score for that given word is worse than a given threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
A system is disclosed for recognizing a pattern in a collection of data given a context of one or more other patterns previously identified. Preferably the system is a speech recognition system, the patterns are words and the collection of data is a sequence of acoustic frames. During the processing of each of a plurality of frames, for each word in an active vocabulary, the system updates a likelihood score representing a probability of a match between the word and the frame, combines a language model score based on one or more previously recognized words with that likelihood score, and prunes the word from the active vocabulary if the combined score is below a threshold. A rapid match is made between the frames and each word of an initial vocabulary to determine which words should originally be placed in the active vocabulary. Preferably the system enables an operator to confirm the system'"'"'s best guess as to the spoken word merely by speaking another word, to indicate that an alternate guess by the system is correct by typing a key associated with that guess, and to indicate that neither the best guess nor the alternate guesses was correct by typing yet another key. The system includes other features, including ones for determining where among the frames to look for the start of speech, and a special hardware processor for computing likelihood scores.
-
Citations
127 Claims
-
1. A speech recognition system for recognizing a word corresponding to a given utterance spoken after one or more preceding words, said system comprising:
-
means for storing an acoustic model of a given word; means for making a word match score corresponding to the probability that said given word corresponds to acoustic data generated by said utterance, including means for making successive partial comparisons between said acoustic model and said acoustic data and for successively updating said word match score after each such partial comparison; context storing means for storing a language context derived from one or more words spoken prior to said given utterance; language score introducing means, including means for generating a language score for said given word corresponding to a language model probability estimate of the probability that said given word would occur given the language context stored in said context storing means and means for separately altering said word match score after each of a plurality of said partial comparisons by an amount corresponding to said language score; and means for stopping said means for making a word match score from making further partial comparisons between said acoustic model of a given word and said acoustic data when the word match score for that given word is worse than a given threshold. - View Dependent Claims (2, 3)
-
-
4. A speech recognition method for recognizing a word corresponding to a given utterance spoken after one or more preceding words, said method comprising:
-
storing an acoustic model of a given word; making a word match score corresponding to the probability that said given word corresponds to acoustic data generated by said utterance, including making successive partial comparisons between said acoustic model and said acoustic data and successively updating said word match score after each such partial comparison; storing a language context derived from one or more words spoken prior to said given utterance; generating a language score for said given word corresponding to a language model probability estimate of the probability that said given word would occur given said stored language; separately altering said word match score after each of a plurality of said partial comparisons by an amount corresponding to said language score; and stopping further partial comparisons between said acoustic model of a given word and said acoustic data when the word match score for that given word is worse than a given threshold. - View Dependent Claims (5, 6)
-
-
7. A speech recognition system for recognizing a word corresponding to a given utterance spoken after one or more preceding words, said system comprising:
-
means for storing an acoustic model of each of a plurality of words belonging to an originally active vocabulary; currently active vocabulary means for indicating which words in said originally active vocabulary belong to a currently active vocabulary; means for causing said currently active vocabulary means to originally indicate that all the words in said originally active vocabulary are in said currently active vocabulary; means for making a word match score for each word in said originally active vocabulary corresponding to the probability that each such word corresponds to acoustic data generated by said utterance, including means for making a succession of estimation cycles, in each of which, for each word in said currently active vocabulary, a partial comparison is made between the acoustic model of said word and said acoustic data and in each of which said word match score for said word is updated in response to said partial comparison; context storing means for storing a language context derived from one or more words spoken prior to said given utterance; language score introducing means for generating language scores for individual words for which said partial comparisons are made and for altering, during each of a plurality of said estimation cycles, the word match score of each such word by an amount corresponding to the language score of each such word, each of said language scores corresponding to a language model probability estimate of the probability that a given word would occur in the language context stored in said context storing means; and means for causing said currently active vocabulary means to indicate that a given word is no longer in said currently active vocabulary when the word match score for said word is worse than a given threshold. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A speech recognition method for recognizing a word corresponding to a given utterance spoken after one or more preceding words, said method comprising:
-
storing an acoustic model of each of a plurality of words belonging to an originally active vocabulary; indicating which words in said originally active vocabulary belong to a currently active vocabulary; originally indicating that all the words in said originally active vocabulary are in said currently active vocabulary; making a word match score for each word in said originally active vocabulary corresponding to the probability that each such word corresponds to acoustic data generated by said utterance, including making a succession of estimation cycles, in each of which, for each word in said currently active vocabulary, a partial comparison is made between the acoustic model of said word and said acoustic data and in each of which said word match score for said word is updated in response to said partial comparison; storing a language context derived from one or more words spoken prior to said given utterance; generating language scores for individual words for which said partial comparisons are made; altering, during each of a plurality of said estimation cycles, the word match score of each such word by an amount corresponding to the language score of each such word, each of said language scores corresponding to a language model probability estimate of the probability that a given word would occur in said stored language context; and indicating that a given word is no longer in said currently active vocabulary when the word match score for said word is worse than a given threshold. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A pattern recognition system for recognizing an individual pattern in a collection of data given a context of one or more other patterns which have been previously identified, said system comprising:
-
means for storing a model of a given pattern; means for making a pattern match score corresponding to the probability that said given pattern corresponds to said collection of data, including means for making successive partial comparisons between said pattern'"'"'s model and said data and for successively updating said pattern match score after each such partial comparison; context storing means for storing a context derived from one or more patterns which have been previously identified; context score introducing means, including means for generating a context score for said given pattern corresponding to a context probability estimate of the probability that said given pattern would occur given the context stored in said context storing means and means for separately altering said pattern match score after each of a plurality of said partial comparison by an amount corresponding to said context score; and means for stopping said means for making a pattern match score from making further partial comparisons between said model and said data when the pattern match score for that given pattern is worse than a given threshold. - View Dependent Claims (26, 27)
-
-
28. A pattern recognition method for recognizing an individual pattern in a collection of data given a context of one or more other patterns which have been previously identified, said method comprising:
-
storing a model of a given pattern; making a pattern match score corresponding to the probability that said given pattern corresponds to said collection of data, including making successive partial comparisons between said pattern'"'"'s model and said data and for successively updating said pattern match score after each such partial comparison; storing a context derived from one or more patterns which have been previously identified; generating a context score for said given pattern corresponding to a context probability estimate of the probability that said given pattern would occur given said stored context; separately altering said pattern match score after each of a plurality of said partial comparison by an amount corresponding to said context score; and stopping further partial comparisons between said model and said data when the pattern match score for that given pattern is worse than a given threshold. - View Dependent Claims (29, 30)
-
-
31. A pattern recognition system for recognizing an individual pattern in a collection of data given a context of one or more other patterns which have been previously identified, said system comprising:
-
means for storing a model of each of a plurality of patterns belonging to an originally active set of patterns; currently active set means for indicating which patterns in said originally active set belong to a currently active set of patterns; means for causing said currently active set means to originally indicate that all the patterns in said originally active set are in said currently active set; means for making a pattern match score corresponding to the probability that each pattern in said originally active set occurs in said collection of data, including means for making a succession of estimation cycles, in each of which, for each pattern in said currently active set, a partial comparison is made between the model of said pattern and said collection of data and in each of which said pattern match score for said pattern is updated in response to said partial comparison; context storing means for storing a context derived from one or more patterns previously identified; context score introducing means for generating context scores for patterns for which said partial comparisons are made and for altering, during each of a plurality of said estimation cycles, the pattern match score of each such pattern by an amount which corresponds to the context score of each such pattern, each of said context scores corresponding to a context probability estimate of the probability that its associated pattern would occur in the context stored in said context storing means; and means for causing said currently active set means to indicate that a given pattern is no longer in said currently active set when the pattern match score for said pattern is worse than a given threshold. - View Dependent Claims (32, 33, 34, 35, 36)
-
-
37. A pattern recognition method for recognizing an individual pattern in a collection of data given a context of one or more other patterns which have been previously identified, said method comprising:
-
storing a model of each of a plurality of patterns belonging to an originally active set of patterns; indicating which patterns in said originally active set belong to a currently active set of patterns; causing said currently active set means to originally indicate that all the patterns in said originally active set are in said currently active set; making a pattern match score corresponding to the probability that each pattern in said originally active set occurs in said collection of data, including making a succession of estimation cycles, in each of which, for each pattern in said currently active set, a partial comparison is made between the model of said pattern and said collection of data and in each of which said pattern match score for said pattern is updated in response to said partial comparison; storing a context derived from one or more patterns previously identified; generating context scores for patterns for which said partial comparisons are made; altering, during each of a plurality of said estimation cycles, the pattern match score of each such pattern by an amount which corresponds to the context score of each such pattern, each of said context scores corresponding to a context probability estimate of the probability that its associated pattern would occur in said stored context; and indicating that a given pattern is no longer in said currently active set when the pattern match score for said pattern is worse than a given threshold. - View Dependent Claims (38, 39, 40, 41, 42)
-
-
43. A speech recognition system for recognizing a word corresponding to a given spoken utterance, said system comprising:
-
means for making, for each word of an initial vocabulary, a relatively rapid scoring, which produces a rapid match score corresponding to the probability that said word corresponds to a sequence of acoustic data generated by said utterance; means for placing a word from said initial vocabulary into an originally active vocabulary when said rapid match score for said word is better than a given threshold; and means for making, for each word of said originally active vocabulary, a more lengthy scoring, which produces a more accurate match score corresponding to the probability that such word corresponds to said sequence of acoustic data. - View Dependent Claims (44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55)
-
-
56. A speech recognition method for recognizing a word corresponding to a given spoken utterance, said method comprising:
-
making, for each word of an initial vocabulary, a relatively rapid scoring, which produces a rapid match score corresponding to the probability that said word corresponds to a sequence of acoustic data generated by said utterance; placing a word from said initial vocabulary into an originally active vocabulary when said rapid match score for said word is better than a given threshold; and making, for each word of said originally active vocabulary, a more lengthy scoring, which produces a more accurate match score corresponding to the probability that such word corresponds to said sequence of acoustic data. - View Dependent Claims (57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68)
-
-
69. A speech recognition system for recognizing a word in a sequence of acoustic data, said system comprising:
-
means for storing an acoustic model of expected background noise; means for comparing said acoustic model with successive portions of said sequence of acoustic data and producing a silence score corresponding to the probability that such successive portions of data correspond to background noise and thus that said utterance has not yet begun; means for determining from said acoustic data the acoustic amplitude corresponding to successive portions of said data; means for producing an accumulated sound score which corresponds a sum of to the amount of said acoustic amplitude over said successive portions of data, said accumulated sound score inversely corresponding to the probability that an utterance has not yet begun; and means, responsive to both said silence score and said accumulated sound score for determining during which of said successive portions of data to look for the start of said utterance. - View Dependent Claims (70, 71)
-
-
72. A speech recognition method for recognizing a word in a sequence of acoustic data, said method comprising:
-
storing an acoustic model of expected background noise; comprising said acoustic model with successive portions of said sequence of acoustic data and producing a silence score corresponding to the probability that such successive portions of data correspond to background noise and thus that said utterance has not yet begun; determining from said acoustic data the acoustic amplitude corresponding to successive portions of said data; producing an accumulated sound score which corresponds to a sum of the amount of said acoustic amplitude over said successive portions of data, said accumulated sound score inversely corresponding to the probability that an utterance has not yet begun; and determining, in response to both said silence score and said accumulated sound score, during which of said successive portions of data to look for the start of said utterance. - View Dependent Claims (73, 74)
-
-
75. A speech recognition system for recogizing a word in a sequence of acoustic data frames, said system comprising:
-
means for storing an initial vocabulary of words; means for performing initial computations on said sequence of acoustic data frames and for producing, in response to said computations, an originally active vocabulary, containing those words from said initial vocabulary which are determined by said initial computations to be the most probable candidates for corresponding to said acoustic data frames; means for remembering the words contained in said originally active vocabulary; currently active vocabulary means for indicating which words in said originally active vocabulary belong to a currently active vocabulary; means for causing said currently active vocabulary means to initially indicate that all the words in said originally active vocabulary are in said currently active vocabulary; means for making a word match score corresponding to the probability that each word in said currently active vocabulary corresponds to said sequence of acoustic data frames, including means for making an estimation cycle for each successive data frame, in which for each word in said currently active vocabulary, a comparison is made between the acoustic model of said word and said frame and said word match score for said word is updated in response to said comparison; means for causing said currently active vocabulary means to indicate that a given word is no longer in said currently active vocabulary when the word match score for said word becomes worse than a given threshold; means for producing a start score corresponding to the probability that an acoustic data frame precedes the beginning of the utterance of the word to be recognized; and means for resetting said currently active vocabulary to include all the words remembered by said means for remembering as belonging in said original vocabulary and for performing said resetting when said start score is better than a given threshold. - View Dependent Claims (76, 77)
-
-
78. A speech recognition method for recognizing a word in a sequence of acoustic data frames, said method comprising:
-
storing an initial vocabulary of words; performing initial computations on said sequence of acoustic data frames and for producing, in response to said computations, an originally active vocabulary, containing those words from said initial vocabulary which are determined by said initial computations to be the most probable candidates for corresponding to said acoustic data frames; remembering the words contained in said originally active vocabulary; indicating which words in said originally active vocabulary belong to a currently active vocabulary; initially indicating that all the words in said originally active vocabulary are in said currently active vocabulary; making a word match score corresponding to the probability that each word in said currently active vocabulary corresponds to said sequence of acoustic data frames, including making an estimation cycle for each successive data frame, in which for each word in said currently active vocabulary, a comparison is made between the acoustic model of said word and said frame and said word match score for said word is updated in response to said comparison; indicating that a given word is no longer in said currently active vocabulary when the word match score for said word becomes worse than a given threshold; producing a start score corresponding to the probability that an acoustic data frame precedes the beginning of the utterance of the word to be recognized; and resetting said currently active vocabulary to include all the words remembered by said remembering as belonging in said original vocabulary and for performing said resetting when said start score is better than a given threshold. - View Dependent Claims (79, 80)
-
-
81. A likelihood processor for use in a speech recognition system to compute a score corresponding to the probabilistic match between a vector of parameters associated with a frame of acoustic data and a node model having for each parameter in said vector both an expected value and a weighting value corresponding to the expected deviation from said expected value, said processor comprising:
-
parameter supplying means for supplying a digital value corresponding to a given parameter of a given frame of data; expected value supplying means for supplying a digital value corresponding to the expected value for said given parameter according to a given node model; first hardware means, receiving inputs from both said parameter supplying means and said expected value supplying means, for computing the absolute value of the difference between said two inputs and for producing said absolute value as a digital output; weighting value supplying means for supplying a digital weighting value corresponding to the expected deviation from said expected value for said given parameter according to said given node model; memory means, receiving addressing inputs from both the output of said first hardware means and said weighting value supplying means, for storing, at each of a plurality of addresses formed by combining said two addressing inputs, a parameter match score corresponding to a probability that said given parameter value corresponds to said expected value, considering said weighting value, and for producing said score as a digital output; a hardware adder having two inputs for receiving digital values and an output at which it produces a digital value corresponding to the sum of the values supplied to its inputs; means for connecting the output of said memory means to one input of said adder; latch means connected to the output of said adder for latching the value produced at said output; and means for supplying the value latched by said latch means to one of the inputs of said adder, whereby said hardware adder can produce a running total of said parameter match scores. - View Dependent Claims (82, 83, 84)
-
-
85. A likelihood processor for use in a speech recognition system to compute a score corresponding to the probabilistic match between a vector of parameters associated with a frame of acoustic data and a node model having for each parameter in said vector both an expected value and a weighting value corresponding to the expected deviation from said expected value, said processor comprising:
-
parameter supplying means for supplying a digital value corresponding to a given parameter of a given frame of data; expected value supplying means for supplying a digital value corresponding to the expected value for said given parameter according to a given node model; subtracting means, receiving inputs from both said parameter supplying means and said expected value supplying means, for computing the absolute value of the difference between said two inputs and for producing said absolute value as a digital output; weighting value supplying means for supplying a digital weighting value corresponding to the expected deviation from said expected value for said given parameter according to said given node model; memory means, receiving addressing inputs from both the output of said subtracting means and said weighting value supplying means, for storing, at each of a plurality of addresses formed by combining said addressing inputs, a parameter match score corresponding to a probability that said given parameter value corresponds to said expected value given said weighting value and for producing said score as a digital output, said parameter match score having a value which corresponds to the sum of two portions, a first portion which corresponds to the product of the output of said subtracting means and said digital weighting value and a second portion which corresponds to the logarithm of said weighting value; and accumulator means for accumulating said parameter match scores associated with a given frame for a given node.
-
-
86. A likelihood processor for use in a speech recognition system to compute a score corresponding to the probabilistic match between a vector of parameters associated with a frame of acoustic data and a node model having for each parameter in said vector both an expected value for said parameter and a weighting value corresponding to the expected deviation from said expected value, said processor comprising:
-
parameter supplying means for supplying a digital value corresponding to a given parameter of a given frame of data; expected value supplying means for supplying a digital value corresponding to the expected value for said given parameter according to a given node model; subtracting means, receiving inputs from both said parameter supplying means and said expected value supplying means, for computing the absolute value of the difference between said two inputs and for producing said absolute value as a digital output; weighting value supplying means for supplying a digital weighting value corresponding to the expected deviation from said expected value for said given parameter according to said given node model; memory means, receiving addressing inputs from both the output of said subtracting means and said weighting value supplying means, for storing, at each of a plurality of addresses formed by combining said addressing inputs, a parameter match score having a value corresponding to a probability that said given parameter value corresponds to said expected value given said weighting value and for producing said match score as a digital output; and accumulator means for accumulating said parameter match scores associated with a given frame for a given node; said subtracting means including means for limiting the numerical value of its output to a limited range which is less than the range of the absolute values made possible by the range of values of its inputs, whereby the memory space required in said memory means to correspond to the address space formed by combining the output of said subtacting means and said weighting value is reduced. - View Dependent Claims (87)
-
-
88. A speech recognition system for recognizing a sequence of separately spoken words, said system comprising:
-
means for responding to a given utterance by producing a best guess word which is the word considered by the system most likely to correspond to said utterance; and means for responding to the utterance of another word as confirmation that said best guess word is the word corresponding to said given utterance. - View Dependent Claims (89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99)
-
-
100. A speech recognition method for recognizing a sequence of spoken words, said method comprising:
-
responding to a given utterance by producing a best guess word which is the word considered by the method most likely to correspond to said utterance; and responding to the utterance of another word as confirmation that said best guess word is the word corresponding to said given utterance. - View Dependent Claims (101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111)
-
-
112. A speech recognition system for recognizing a sequence of speech sounds, said system comprising:
-
means for responding to said sequence of speech sounds by producing a best guess word, which is the word considered by the system most likely to correspond to said sequence, and one or more close call words, which are considered by the system next most likely to correspond to said given sequence after said best guess word; and means for displaying said best guess word so a user can see it; means for displaying said one or more close call words so a user can see it; and means for responding to an input signal associated with a given close call word as confirmation that said given close call word is the word corresponding to said given sequence of speech sounds, rather than said best guess word. - View Dependent Claims (113, 114, 115, 116, 117, 118, 119)
-
-
120. A speech recognition method for recognizing a sequence of speech sounds, said method comprising:
-
responding to said sequence of speech sounds by producing a best guess word, which is the word considered by the method most likely to correspond to said sequence, and one or more close call words, which are considered by the method next most likely to correspond to said given sequence after said best guess word; and displaying said best guess word so a user can see it; displaying said one or more close call words so a user can see it; and responding to an input signal associated with a given close call word as confirmation that said given close call word is the word corresponding to said given sequence of speech sounds, rather than said best guess word. - View Dependent Claims (121, 122, 123, 124, 125, 126, 127)
-
Specification