Adaptation of language models and context free grammar in speech recognition

US 7,925,505 B2
Filed: 04/10/2007
Issued: 04/12/2011
Est. Priority Date: 04/10/2007
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented system that facilitates speech recognition, comprising:

a recognition component using a computer system for generating a recognized result based on an input phrase, the recognition component using a statistical language model (SLM) having an empirical error rate defined as an error rate accumulated over a set of training samples collected from at least one of actual speech input or generated pseudo samples from existing models, in order to reflect an ability of the SLM to differentiate different terms during recognition;

an interaction component for interacting with the recognized result to create a corrected result;

an adaptation component for receiving the recognized result and the corrected result and discriminatively adapting the SLM to the corrected result based on criteria to minimize the empirical error rate defined over a training corpus as an objective function, wherein the adaptation component facilitates discriminative adaptation and training of context-free grammars (CFG) to optimize the criteria; and

a processor that executes computer-executable instructions associated with at least one of the recognition component, the interaction component, or the adaptation component.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Architecture is disclosed herewith for minimizing an empirical error rate by discriminative adaptation of a statistical language model in a dictation and/or dialog application. The architecture allows assignment of an improved weighting value to each term or phrase to reduce empirical error. Empirical errors are minimized whether a user provides correction results or not based on criteria for discriminatively adapting the user language model (LM)/context-free grammar (CFG) to the target. Moreover, algorithms are provided for the training and adaptation processes of LM/CFG parameters for criteria optimization.

Citations

18 Claims

1. A computer-implemented system that facilitates speech recognition, comprising:
- a recognition component using a computer system for generating a recognized result based on an input phrase, the recognition component using a statistical language model (SLM) having an empirical error rate defined as an error rate accumulated over a set of training samples collected from at least one of actual speech input or generated pseudo samples from existing models, in order to reflect an ability of the SLM to differentiate different terms during recognition;
  
  an interaction component for interacting with the recognized result to create a corrected result;
  
  an adaptation component for receiving the recognized result and the corrected result and discriminatively adapting the SLM to the corrected result based on criteria to minimize the empirical error rate defined over a training corpus as an objective function, wherein the adaptation component facilitates discriminative adaptation and training of context-free grammars (CFG) to optimize the criteria; and
  
  a processor that executes computer-executable instructions associated with at least one of the recognition component, the interaction component, or the adaptation component.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1, wherein the objective function for the criteria for minimizing the empirical error rate is defined by,
  - 3. The system of claim 2, wherein the objective function is minimized by a sequential gradient-based solution.
  - 4. The system of claim 3, wherein parameter values used for minimizing the objective function are employed in at least one of a dialog application or a dictation application.
  - 5. The system of claim 1, wherein the interaction component facilitates creation of the corrected result by user correction of the recognized result.
  - 6. The system of claim 1, wherein the adaptation component computes an approximate distance between a probability distribution of a correct word transcription and a probability distribution of an incorrect word transcription.
  - 7. The system of claim 1, wherein the adaptation component utilizes an acoustical model to generate a distance between a probability distribution of a correct word transcription and a probability distribution of an incorrect word transcription if there are insufficient training samples.
  - 8. The system of claim 1, wherein the adaptation component optimizes n-gram parameters when acoustical data is available.
  - 9. The system of claim 1, wherein the corrected result is weighted against the recognized result to create a score, the score fed back to the recognition component for recognition processing of additional input phrases.

10. A computer-implemented method of processing speech, comprising acts of:
- processing input terms using a computer system into first-time recognized results using a recognizer and statistical language model having an empirical error rate defined as an error rate accumulated over a set of training samples collected from at least one of actual speech input or generated pseudo samples from existing models, in order to reflect an ability of the statistical language model to differentiate different terms during recognition;
  
  generating user-corrected results based on the first-time recognized results;
  
  optimizing CFG weights to minimize the empirical error rate;
  
  discriminatively adapting the statistical language model to the user-corrected results based on criteria to minimize the empirical error rate defined over a training corpus as an objective function;
  
  generating new language model scores based on the user-corrected results and the first-time recognized results;
  
  inputting the new language model scores to a recognizer to process additional input terms; and
  
  utilizing a processor that executes instructions stored in memory to perform at least one of the acts of processing, generating, optimizing, discriminatively adapting, or inputting.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The method of claim 10, further comprising weighting the first-time recognized results based on the new scores to reduce empirical error in the language model.
  - 12. The method of claim 10, further comprising discriminatively training CFG parameters in a dialog application.
  - 13. The method of claim 10, further comprising discriminatively adapting CFG parameters in a dialog application.
  - 14. The method of claim 10, further comprising discriminatively adapting the statistical language model in a dictation application.
  - 15. The method of claim 10, further comprising discriminatively adapting CFG parameters when there is no available acoustic data.
  - 16. The method of claim 15, further comprising employing information of at least one of a lexicon model or an acoustical model to approximate a difference between the first-time recognition results and the user-corrected results.
  - 17. The method of claim 10, further comprising batch processing the additional input terms based on the minimized empirical error rate.

18. A computer-implemented system, comprising:
- computer-implemented means for processing input terms into first-time recognized results using a statistical language model having an empirical error rate defined as an error rate accumulated over a set of training samples collected from at least one of actual speech input or generated pseudo samples from existing models, in order to reflect an ability of the statistical language model to differentiate different terms during recognition;
  
  computer-implemented means for generating user-corrected results based on the first-time recognized results;
  
  computer-implemented means for discriminatively training or adapting CFG parameters in a dialog application;
  
  computer-implemented means for discriminatively adapting the statistical language model to the user-corrected results based on criteria to minimize the empirical error rate defined over a training corpus as an objective function;
  
  computer-implemented means for generating new language model scores based on the user-corrected results and the first-time recognized results;
  
  computer-implemented means for inputting the new language model scores back to a recognizer to process additional input terms; and
  
  processor means that executes computer-executable instructions associated with at least one of the means for processing, generating, discriminatively adapting, or inputting.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Wu, Jian
Primary Examiner(s)
Lerner; Martin

Application Number

US11/784,896
Publication Number

US 20080255844A1
Time in Patent Office

1,463 Days
Field of Search

704/235, 704/236, 704/238, 704/240, 704/243, 704/244, 704/255, 704/256.3
US Class Current

704/236
CPC Class Codes

G10L 15/193   Formal grammars, e.g. finit...

G10L 15/197   Probabilistic grammars, e.g...

G10L 2015/0631   Creating reference template...

Adaptation of language models and context free grammar in speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Adaptation of language models and context free grammar in speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links