Speech recognition system

US 8,229,743 B2
Filed: 06/23/2009
Issued: 07/24/2012
Est. Priority Date: 06/23/2009
Status: Active Grant

First Claim

Patent Images

1. A speech recognition apparatus, comprising:

a general-corpus statistical language model that provides probability estimates of how linguistically likely a sequence of linguistic items are to occur in that sequence based on an amount of times the sequence of linguistic items occurs in text and phrases in general use;

a speech recognition decoder module that requests a run-time correction module for one or more corrected probability estimates P′

(z|xy) of how likely a linguistic item z is to follow a given sequence of linguistic items x followed by y;

where x, y, and z are three variable linguistic items supplied from the decoder module, and the decoder module has an input to receive back the one or more domain correct probability estimates from the run-time correction module for one or more possible linguistic items z that follow the given sequence of linguistic items x followed by y;

a first input in the run-time correction module configured to receive requests from the decoder module to return the one or more domain correct probability estimates for the one or more possible linguistic items z that could follow the given sequence of linguistic items x followed by y, wherein the run-time correction module is trained to linguistics of a specific domain, and is located in between the speech recognition decoder module and the statistical language model in order to adapt the probability estimates supplied by the general-corpus statistical language model to the specific domain when those probability estimates from the general-corpus statistical language model disagree by at least an established criterion based on a statistical test with the linguistic probabilities in that domain;

a second input in the run-time correction module configured to receive from the statistical language model one or more probability estimates P(z|xy) of how likely are each of the possible linguistic items z that could follow the given sequence of linguistic items x followed by y;

an output in the run-time correction module to return to the decoder module one or more domain corrected probability estimates P′

(z|xy) of how likely are each of the possible linguistic items z that could follow the given sequence of linguistic items x followed by y; and

an output module of the speech recognition system configured to provide a representation of what uttered sounds and words were inputted into the speech recognition system based on the domain corrected probability estimates, wherein the modules and models making up the speech recognition apparatus are implemented in electronic circuits, software coding, and any combination of the two, where portions implemented in software coding are stored in a format that is executable by a processor.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Various methods and apparatus are described for a speech recognition system. In an embodiment, the statistical language model (SLM) provides probability estimates of how linguistically likely a sequence of linguistic items are to occur in that sequence based on an amount of times the sequence of linguistic items occurs in text and phrases in general use. The speech recognition decoder module requests a correction module for one or more corrected probability estimates P′(z|xy) of how likely a linguistic item z follows a given sequence of linguistic items x followed by y, where (x, y, and z) are three variable linguistic items supplied from the decoder module. The correction module is trained to linguistics of a specific domain, and is located in between the decoder module and the SLM in order to adapt the probability estimates supplied by the SLM to the specific domain when those probability estimates from the SLM significantly disagree with the linguistic probabilities in that domain.

Citations

20 Claims

1. A speech recognition apparatus, comprising:
- a general-corpus statistical language model that provides probability estimates of how linguistically likely a sequence of linguistic items are to occur in that sequence based on an amount of times the sequence of linguistic items occurs in text and phrases in general use;
  
  a speech recognition decoder module that requests a run-time correction module for one or more corrected probability estimates P′
  
  (z|xy) of how likely a linguistic item z is to follow a given sequence of linguistic items x followed by y;
  
  where x, y, and z are three variable linguistic items supplied from the decoder module, and the decoder module has an input to receive back the one or more domain correct probability estimates from the run-time correction module for one or more possible linguistic items z that follow the given sequence of linguistic items x followed by y;
  
  a first input in the run-time correction module configured to receive requests from the decoder module to return the one or more domain correct probability estimates for the one or more possible linguistic items z that could follow the given sequence of linguistic items x followed by y, wherein the run-time correction module is trained to linguistics of a specific domain, and is located in between the speech recognition decoder module and the statistical language model in order to adapt the probability estimates supplied by the general-corpus statistical language model to the specific domain when those probability estimates from the general-corpus statistical language model disagree by at least an established criterion based on a statistical test with the linguistic probabilities in that domain;
  
  a second input in the run-time correction module configured to receive from the statistical language model one or more probability estimates P(z|xy) of how likely are each of the possible linguistic items z that could follow the given sequence of linguistic items x followed by y;
  
  an output in the run-time correction module to return to the decoder module one or more domain corrected probability estimates P′
  
  (z|xy) of how likely are each of the possible linguistic items z that could follow the given sequence of linguistic items x followed by y; and
  
  an output module of the speech recognition system configured to provide a representation of what uttered sounds and words were inputted into the speech recognition system based on the domain corrected probability estimates, wherein the modules and models making up the speech recognition apparatus are implemented in electronic circuits, software coding, and any combination of the two, where portions implemented in software coding are stored in a format that is executable by a processor.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The apparatus of claim 1, further comprising:
    - a statistical language correction module coupled to the general-corpus statistical language model, where the statistical language correction module uses the established criterion based on the statistical test to determine whether a difference in the observed counts in a domain specific count database of sequences of linguistic items x, y and each of the possible linguistic items z are in fact significantly different from an estimated amount of counts of that same linguistic sequence xyz derived from the general-corpus statistical language model, wherein the units of the linguistic items xyz are words, word phrases, or a combination of both.
  - 3. The apparatus of claim 1, further comprising:
    - a special N-gram repository coupled to the run-time correction module, where the special N-gram repository acts as a repository to store all special N-grams, sequences of linguistic items xyz, that have significantly different counts/occurrences in the corpus of domain specific text analyzed than would be expected compared to a background training data from the general-corpus statistical language model indicative of text phrases in general use, where the special N-grams (xyz) are three or more linguistic items in that sequence and are stored along with the actual counts of the number of times that N-gram appeared in the corpus of domain specific text analyzed, and the special N-gram repository when queried with a linguistic sequence of xyz returns whether the N-gram xyz is included in the repository database and the observed counts associated with that special N-gram (xyz).
  - 4. The apparatus of claim 1, further comprising:
    - a count database coupled to the run-time correction module, where the count database is a pre-populated database specific to a linguistic domain that contains at least the number of counts that the sequence of linguistic items x followed by y occurs in overall corpus of domain-specific text from this domain analyzed C(xy), as well as the number of counts C(xyz) the N-grams (xyz), sequences of linguistic items of x followed by y followed by z, occurs in the overall corpus of domain-specific text from this analyzed domain, and where the count database returns the linguistic sequences of xy, the N-gram (xyz), and the observed counts of both C(xy) and C(xyz) in the corpus of domain-specific text analyzed when requested by the run-time correction module, but is not itself modified either during a training time or at run time.
  - 5. The apparatus of claim 1, further comprising:
    - a Phi database of normalization values (Phi) coupled to the run-time correction module, where the Phi database of normalization values stores normalization values Phi(xy) for contexts of given linguistic items x followed by y for each possible z, wherein the Phi database applies a mathematical factor to correct the count data in raw form from a domain specific count database to have normalized probabilities so the sum for the set of all returned possibilities for z is equal to 100 percent.
  - 6. The apparatus of claim 1, wherein the run-time correction module receives an input from the statistical language module, a Phi database of normalization values, a domain specific counts database, a special N-gram repository of sequences of linguistic items, and the speech recognition decoder module, andthe speech recognition decoder module sends to the first input of the run-time correction module the given sequence of linguistic items xy that is assumed to be correct, and asks what is the probability of each linguistically possible word z for each of the individual block combinations of xy and z, where the run-time correction module first determines whether each possibility of the linguistic sequence xy and z is a special N-gram listed in the special N-gram repository or not, and when the sequence of linguistic items xyz is a special N-gram, then the correction module generates a corrected probability estimate directly from the observed counts in that domain and discards the predictions from the general-corpus statistical language module for that xyz possibility.
  - 7. The apparatus of claim 6, wherein the run-time correction module applies a normalization factor to estimates from the statistical language model when the observed counts from the domain training are consistent with an estimate from the general corpus statistical language model;
    - but when the observed counts are not consistent and thus significantly differ, then the run-time correction module discards the statistical language model'"'"'s returned probabilities and substitutes an associated probability of smoothed count value C′
      
      (xyz) divided by smoothed count value C′
      
      (xy) from an adjusted counts database for each matching special N-gram of xy and z in the special N-gram repository as its corrected probability estimate P′
      
      (z|xy), where the smoothed counts mathematically readjust the probability estimate associated with that special N-gram so the returned corrected probabilities are neither assigned a zero percent probability nor have numeric value which was divided by the number zero.
  - 8. The apparatus of claim 1, further comprising:
    - a statistical language correction module coupled to the general-corpus statistical language model, where the statistical language correction module conducts a training phase for a specific linguistic domain of text analyzed, in which contexts xy and possible words z for which the probability of a word z follows the given sequence words x followed by y P(z|xy) is a poor estimate are identified, where the statistical language correction module queries the general-corpus statistical language model for estimates of the probabilities of any words z in the set of z that generally follow and/or are sequentially grouped with the words x followed by y, where the statistical language correction module then compares these with the domain counts of occurrences that the sequence of the words C(xy) and C(xyz) appear in the specific linguistic domain of text analyzed from a counts database, where the statistical language correction module queries a discrepancy detection module for whether the count ratio is significantly discrepant to the anticipated count ratio of the sequences of these words being found;
      
      if so, sends the special N-gram and its C(xyz) to a special N-gram repository to be recorded.
  - 9. The apparatus of claim 1, further comprising:
    - a statistical language correction module coupled to the general-corpus statistical language model, where the statistical language correction module is configured to conduct a training phase aimed at identifying and recording those sequences of linguistic items xyz (N-grams) for which the probability estimate P(z|xy) is not a plausible estimate of the probability of z following xy in the current domain of linguistics, given the domain-corpus counts C(xyz) and C(xy) in the count database, and where only N-grams xyz with associated positive counts C(xyz) greater than one are stored in the count database as identified sequences of linguistic items xyz for which the probability estimate P(z|xy) is not a plausible estimate.
  - 10. The apparatus of claim 1, further comprising:
    - a special N-gram decremented database coupled to the statistical language correction module, where the special N-gram decremented database stores special N-grams xyz and associated counts C(xyz) of the N-grams (xyz) that have been determined that P(z|xy) from the statistical language model is not a plausible estimate given the decremented counts C(xy)−
      
      1 and C(xyz)−
      
      1, and when queried with a first count amount, the special N-gram decremented database also returns to the statistical language correction module how many N-grams xyz in the database, which have that first count amount.
  - 11. The apparatus of claim 9, further comprising:
    - a discrepancy detection module coupled to the statistical language correction module, where the discrepancy detection module is a filter to eliminate words that should not be stored as special N-grams in a special N-gram repository, where the discrepancy detection module determines both (1) whether the probability estimate P(z|xy) from the statistical language model is a plausible probability estimate given the context count C(xy) and the N-gram count C(xyz), and also (2) whether it would be a plausible estimate if the counts were instead C(xy)−
      
      1 and C(xyz)−
      
      1, given a context count C(xy) and an N-gram count C(xyz) from the domain specific text stored in a count database and a probability estimate P(z|xy) from the statistical language model.
  - 12. The apparatus of claim 1, further comprising:
    - an adjusted counts database coupled to the run-time correction module, where the adjusted counts database stores values smoothed C′
      
      (xy) for contexts xy, where the counts in raw form for contexts xy are mathematically adjusted so the normalization the probability estimate is readjusted.

13. A method for a speech recognition system, comprising:
- improving an accuracy of probability estimates of how linguistically likely a sequence of linguistic items are to occur in that sequence from an underlying statistical language model by adding a correction module trained on different or more extensive data than the underlying statistical language model, in which the correction module systematically corrects the statistical language model estimates, where those probability estimates from the statistical language model significantly disagree with evidence available to the correction module;
  
  identifying all special N-grams (Yz) those for which the probability estimate P(z|Y) from the statistical language model is implausible given actual counts of C(Yz) and C(Y) that have been identified and recorded in a count database, where z is a specific linguistic unit that occurs in a given context of Y and Y is a specific sequence of one or more of other linguistic units;
  
  establishing a threshold value (t) to determine whether the evidence available to the correction module being a difference in the observed counts in the count database C(Yz) from an estimated amount of counts of the N-gram context of Yz in a background training data derived from the underlying statistical language model E(Yz);
  
  in response to a request from a decoder module of the speech recognition system, the correction module returning estimates from the statistical language model with a normalization factor applied to those estimates when an actual number of counts of the N-gram context of Yz in the corpus of specific text analyzed by the correction module C(Yz) are not significantly different, by being within the threshold value t, from the estimated amount of counts of the N-gram from the statistical language model; and
  
  in response to a request from a decoder module of the speech recognition system, the correction module returning for each special N-grams (Yz), a corrected probability estimate associated with that special N-gram based on counts of the linguistic sequence found in the count database when the actual number of counts of the N-gram sequence of Yz in the corpus of text analyzed by the correction module C(Yz) are significantly different from the estimated amount of counts of the N-gram from the statistical language model, and discarding then the statistical language model'"'"'s returned probabilities.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The method of claim 13, wherein the correction module returns corrected probability estimates for each special N-gram of Yz based on a smoothed count of C′
    - (Yz) divided by a smoothed count of C′
      
      (Y), where the smoothed counts mathematically readjust the probability estimate associated with that special N-gram so the returned corrected probability estimate are neither assigned a zero percent probability nor have numeric value which was divided by the number zero, and the underlying statistical language model is a general corpus statistical language model in which the correction module trained to a specific domain systematically corrects the general corpus statistical language model estimates, where those probability estimates from the statistical language model significantly disagree with the linguistic probabilities in that domain.
  - 15. The method of claim 14, wherein the threshold value t is determined by using a 95% Poisson Distribution model and the smoothed counts are generated with a frequency of frequencies table.
  - 16. The method of claim 13, wherein during a training phase of a speech recognition system, generating the normalization factor because the correction module identified that certain sequences of linguistic items are more frequently used in the corpus of text analyzed by the correction module than in the background training data used for the underlying statistical language model and the probabilities coming from the statistical language model are normalized in light of the context of the corpus of text analyzed by the correction module, wherein the special N-grams Yz contains three or more linguistic items in a sequence and are stored along with the actual counts of the number of times that N-gram appeared in the corpus of domain specific text analyzed, and the units of the linguistic items are either words, word phrases, or a combination of both, and the correction module is trained to take account of larger contexts when estimating probabilities from the underlying statistical language model as well as compensate for inaccuracies in a probability-smoothing algorithm used by the underlying statistical language model.
  - 17. The method of claim 16, wherein the statistical language correction module merely identifies special N-gram of Yz by evaluating only sequences of Y which have counts C(Y) greater than one (>
    - 1) in addition to being greater in difference than threshold value (t).

18. A continuous speech recognition system over a network, comprising:
- a continuous speech recognition engine that includesfront-end filters and sound data parsers configured to convert a supplied audio file of a continuous voice communication, as opposed to a paused voice command communication, into a time coded sequence of sound feature frames for speech recognition,a speech recognition decoder module having an input to receive the time coded sequence of sound feature frames from the front-end filters as an input, where the speech recognition decoder module applies a speech recognition process to the sound feature frames and determines at least a best guess at each recognizable word that corresponds to the sound feature frames,a user interface of the continuous speech recognition system has an input to receive the supplied audio files from a client machine over the wide area network and supply the supplied audio files to the front end filters,a general-corpus statistical language model that provides probability estimates of how linguistically likely a sequence of linguistic items are to occur in that sequence based on an amount of times the sequence of linguistic items occurs in text and phrases in general use, wherein the speech recognition decoder module requests a run-time correction module for one or more corrected probability estimates P′
  
  (z|xy) of how likely a linguistic item z follows a given sequence of linguistic items x followed by y;
  
  where x, y, and z are three variable linguistic items supplied from the decoder module, and the decoder module has an input to receive back the one or more domain correct probability estimates from the run-time correction module for one or more possible linguistic items z that follow the given sequence of linguistic items x followed by y,a first input in a run-time correction module configured to receive requests from the decoder module to return the one or more domain correct probability estimates for the one or more possible linguistic items z that could follow the given sequence of linguistic items x followed by y, wherein the run-time correction module is trained to linguistics of a specific domain, and is located in between the speech recognition decoder module and the statistical language model in order to adapt the probability estimates supplied by the general-corpus statistical language model to the specific domain when those probability estimates from the general-corpus statistical language model significantly disagree by at least an established criterion based on a statistical test with the linguistic probabilities in that domain,a second input in the run-time correction module configured to receive from the statistical language model one or more probability estimates P(z|xy) of how likely are each of the possible linguistic items z that could follow the given sequence of linguistic items x followed by y,an output in the run-time correction module to return to the decoder module one or more domain corrected probability estimates P′
  
  (z|xy) of how likely are each of the possible linguistic items z that could follow the given sequence of linguistic items x followed by y;
  
  an output module of the speech recognition system configured to provide a representation of what uttered sounds and words were inputted into the speech recognition system based on the domain corrected probability estimates; and
  
  a server to host the continuous speech recognition engine.
- View Dependent Claims (19, 20)
- - 19. The continuous speech recognition system of claim 18, further comprising:
    - a database to store each word from the output module with an assigned robust confidence level parameter and a start and stop time code from that word;
      
      an intelligence engine configured to assign a higher weight to recognized words with a robust confidence level above a threshold than recognized words below the threshold, and use the weight for the recognized words when queries are made with the user interface;
      
      a special N-gram repository coupled to the run-time correction module, where the special N-gram repository acts as a repository to store all special N-grams, sequences of linguistic items xyz, that have significantly different counts/occurrences in the corpus of domain specific text analyzed than would be expected compared to a background training data from the general-corpus statistical language model indicative of text phrases in general use, where the special N-grams (xyz) are three or more linguistic items in that sequence and are stored along with the actual counts of the number of times that N-gram appeared in the corpus of domain specific text analyzed, and the special N-gram repository when queried with a linguistic sequence of xyz returns whether the N-gram xyz is included in the repository database and the observed counts associated with that special N-gram (xyz);
      
      the domain specific count database coupled to the run-time correction module, where the count database returns the linguistic sequences of xy, the N-gram (xyz), and the observed counts of both C(xy) and C(xyz) in the corpus of domain-specific text analyzed when requested by the run-time correction module, but is not itself modified either during a training time or at run time; and
      
      a statistical language correction module that uses the established criterion based on the statistical test to determine whether a difference in the observed counts in the domain specific count database of sequences of linguistic items x, y and each of the possible linguistic items z are in fact significantly different from an estimated amount of counts of that same linguistic sequence xyz derived from the general use statistical language model, wherein the units of the linguistic items xyz are words, word phrases, or a combination of both.
  - 20. The continuous speech recognition system of claim 19, further comprising:
    - a Phi database of normalization values coupled to the run-time correction module, where the Phi database of normalization values stores normalization values Phi(xy) for contexts of given linguistic items x followed by y for each possible z, wherein the Phi database applies a mathematical factor to correct the count data in raw form from a domain specific count database to have normalizes probabilities so the sum for the set of all returned possibilities for z is equal to one hundred percent, in response to a query from the decoder module, the run-time correction module first determines whether each possibility of the linguistic sequence xy and z is a special N-gram listed in the special N-gram repository or not, and when the sequence of linguistic items xyz is a special N-gram, then the correction module generates a corrected probability estimate directly from the observed counts in that domain and discards the predictions from the general-corpus statistical language module for that xyz possibility, and when the special N-gram xyz is not in the special N-gram repository, then the run-time correction module applies a normalization factor to estimates from the statistical language model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Longsand Limited (Open Text Corporation)
Original Assignee
Autonomy Corp Ltd (HP Inc.)
Inventors
Carter, David, Kadirkamanathan, Mahapathy
Primary Examiner(s)
Abebe, Daniel D

Application Number

US12/489,786
Publication Number

US 20100324901A1
Time in Patent Office

1,127 Days
Field of Search

704/251, 704/255
US Class Current

704/251
CPC Class Codes

G10L 15/065 Adaptation

G10L 15/197 Probabilistic grammars, e.g...

Speech recognition system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links