Speech recognition system
First Claim
1. A speech recognition apparatus, comprising:
- a general-corpus statistical language model that provides probability estimates of how linguistically likely a sequence of linguistic items are to occur in that sequence based on an amount of times the sequence of linguistic items occurs in text and phrases in general use;
a speech recognition decoder module that requests a run-time correction module for one or more corrected probability estimates P′
(z|xy) of how likely a linguistic item z is to follow a given sequence of linguistic items x followed by y;
where x, y, and z are three variable linguistic items supplied from the decoder module, and the decoder module has an input to receive back the one or more domain correct probability estimates from the run-time correction module for one or more possible linguistic items z that follow the given sequence of linguistic items x followed by y;
a first input in the run-time correction module configured to receive requests from the decoder module to return the one or more domain correct probability estimates for the one or more possible linguistic items z that could follow the given sequence of linguistic items x followed by y, wherein the run-time correction module is trained to linguistics of a specific domain, and is located in between the speech recognition decoder module and the statistical language model in order to adapt the probability estimates supplied by the general-corpus statistical language model to the specific domain when those probability estimates from the general-corpus statistical language model disagree by at least an established criterion based on a statistical test with the linguistic probabilities in that domain;
a second input in the run-time correction module configured to receive from the statistical language model one or more probability estimates P(z|xy) of how likely are each of the possible linguistic items z that could follow the given sequence of linguistic items x followed by y;
an output in the run-time correction module to return to the decoder module one or more domain corrected probability estimates P′
(z|xy) of how likely are each of the possible linguistic items z that could follow the given sequence of linguistic items x followed by y; and
an output module of the speech recognition system configured to provide a representation of what uttered sounds and words were inputted into the speech recognition system based on the domain corrected probability estimates, wherein the modules and models making up the speech recognition apparatus are implemented in electronic circuits, software coding, and any combination of the two, where portions implemented in software coding are stored in a format that is executable by a processor.
2 Assignments
0 Petitions
Accused Products
Abstract
Various methods and apparatus are described for a speech recognition system. In an embodiment, the statistical language model (SLM) provides probability estimates of how linguistically likely a sequence of linguistic items are to occur in that sequence based on an amount of times the sequence of linguistic items occurs in text and phrases in general use. The speech recognition decoder module requests a correction module for one or more corrected probability estimates P′(z|xy) of how likely a linguistic item z follows a given sequence of linguistic items x followed by y, where (x, y, and z) are three variable linguistic items supplied from the decoder module. The correction module is trained to linguistics of a specific domain, and is located in between the decoder module and the SLM in order to adapt the probability estimates supplied by the SLM to the specific domain when those probability estimates from the SLM significantly disagree with the linguistic probabilities in that domain.
-
Citations
20 Claims
-
1. A speech recognition apparatus, comprising:
-
a general-corpus statistical language model that provides probability estimates of how linguistically likely a sequence of linguistic items are to occur in that sequence based on an amount of times the sequence of linguistic items occurs in text and phrases in general use; a speech recognition decoder module that requests a run-time correction module for one or more corrected probability estimates P′
(z|xy) of how likely a linguistic item z is to follow a given sequence of linguistic items x followed by y;
where x, y, and z are three variable linguistic items supplied from the decoder module, and the decoder module has an input to receive back the one or more domain correct probability estimates from the run-time correction module for one or more possible linguistic items z that follow the given sequence of linguistic items x followed by y;a first input in the run-time correction module configured to receive requests from the decoder module to return the one or more domain correct probability estimates for the one or more possible linguistic items z that could follow the given sequence of linguistic items x followed by y, wherein the run-time correction module is trained to linguistics of a specific domain, and is located in between the speech recognition decoder module and the statistical language model in order to adapt the probability estimates supplied by the general-corpus statistical language model to the specific domain when those probability estimates from the general-corpus statistical language model disagree by at least an established criterion based on a statistical test with the linguistic probabilities in that domain; a second input in the run-time correction module configured to receive from the statistical language model one or more probability estimates P(z|xy) of how likely are each of the possible linguistic items z that could follow the given sequence of linguistic items x followed by y; an output in the run-time correction module to return to the decoder module one or more domain corrected probability estimates P′
(z|xy) of how likely are each of the possible linguistic items z that could follow the given sequence of linguistic items x followed by y; andan output module of the speech recognition system configured to provide a representation of what uttered sounds and words were inputted into the speech recognition system based on the domain corrected probability estimates, wherein the modules and models making up the speech recognition apparatus are implemented in electronic circuits, software coding, and any combination of the two, where portions implemented in software coding are stored in a format that is executable by a processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for a speech recognition system, comprising:
-
improving an accuracy of probability estimates of how linguistically likely a sequence of linguistic items are to occur in that sequence from an underlying statistical language model by adding a correction module trained on different or more extensive data than the underlying statistical language model, in which the correction module systematically corrects the statistical language model estimates, where those probability estimates from the statistical language model significantly disagree with evidence available to the correction module; identifying all special N-grams (Yz) those for which the probability estimate P(z|Y) from the statistical language model is implausible given actual counts of C(Yz) and C(Y) that have been identified and recorded in a count database, where z is a specific linguistic unit that occurs in a given context of Y and Y is a specific sequence of one or more of other linguistic units; establishing a threshold value (t) to determine whether the evidence available to the correction module being a difference in the observed counts in the count database C(Yz) from an estimated amount of counts of the N-gram context of Yz in a background training data derived from the underlying statistical language model E(Yz); in response to a request from a decoder module of the speech recognition system, the correction module returning estimates from the statistical language model with a normalization factor applied to those estimates when an actual number of counts of the N-gram context of Yz in the corpus of specific text analyzed by the correction module C(Yz) are not significantly different, by being within the threshold value t, from the estimated amount of counts of the N-gram from the statistical language model; and in response to a request from a decoder module of the speech recognition system, the correction module returning for each special N-grams (Yz), a corrected probability estimate associated with that special N-gram based on counts of the linguistic sequence found in the count database when the actual number of counts of the N-gram sequence of Yz in the corpus of text analyzed by the correction module C(Yz) are significantly different from the estimated amount of counts of the N-gram from the statistical language model, and discarding then the statistical language model'"'"'s returned probabilities. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A continuous speech recognition system over a network, comprising:
-
a continuous speech recognition engine that includes front-end filters and sound data parsers configured to convert a supplied audio file of a continuous voice communication, as opposed to a paused voice command communication, into a time coded sequence of sound feature frames for speech recognition, a speech recognition decoder module having an input to receive the time coded sequence of sound feature frames from the front-end filters as an input, where the speech recognition decoder module applies a speech recognition process to the sound feature frames and determines at least a best guess at each recognizable word that corresponds to the sound feature frames, a user interface of the continuous speech recognition system has an input to receive the supplied audio files from a client machine over the wide area network and supply the supplied audio files to the front end filters, a general-corpus statistical language model that provides probability estimates of how linguistically likely a sequence of linguistic items are to occur in that sequence based on an amount of times the sequence of linguistic items occurs in text and phrases in general use, wherein the speech recognition decoder module requests a run-time correction module for one or more corrected probability estimates P′
(z|xy) of how likely a linguistic item z follows a given sequence of linguistic items x followed by y;
where x, y, and z are three variable linguistic items supplied from the decoder module, and the decoder module has an input to receive back the one or more domain correct probability estimates from the run-time correction module for one or more possible linguistic items z that follow the given sequence of linguistic items x followed by y,a first input in a run-time correction module configured to receive requests from the decoder module to return the one or more domain correct probability estimates for the one or more possible linguistic items z that could follow the given sequence of linguistic items x followed by y, wherein the run-time correction module is trained to linguistics of a specific domain, and is located in between the speech recognition decoder module and the statistical language model in order to adapt the probability estimates supplied by the general-corpus statistical language model to the specific domain when those probability estimates from the general-corpus statistical language model significantly disagree by at least an established criterion based on a statistical test with the linguistic probabilities in that domain, a second input in the run-time correction module configured to receive from the statistical language model one or more probability estimates P(z|xy) of how likely are each of the possible linguistic items z that could follow the given sequence of linguistic items x followed by y, an output in the run-time correction module to return to the decoder module one or more domain corrected probability estimates P′
(z|xy) of how likely are each of the possible linguistic items z that could follow the given sequence of linguistic items x followed by y;an output module of the speech recognition system configured to provide a representation of what uttered sounds and words were inputted into the speech recognition system based on the domain corrected probability estimates; and a server to host the continuous speech recognition engine. - View Dependent Claims (19, 20)
-
Specification