Discriminative training of language models for text and speech classification

US 20040249628A1
Filed: 06/03/2003
Published: 12/09/2004
Est. Priority Date: 06/03/2003
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for estimating a set of parameters for each of a plurality of language models that correspond to a plurality of classes, the method comprising:

setting initial values for the sets of parameters; and

adjusting the sets of parameters jointly in relation to one another to increase a conditional likelihood of a class given a word string.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.

80 Citations

View as Search Results

25 Claims

1. A computer-implemented method for estimating a set of parameters for each of a plurality of language models that correspond to a plurality of classes, the method comprising:
- setting initial values for the sets of parameters; and
  
  adjusting the sets of parameters jointly in relation to one another to increase a conditional likelihood of a class given a word string.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the plurality of language models is a plurality of n-gram language models.
  - 3. The method of claim 2, wherein the conditional likelihood is maximized.
  - 4. The method of claim 3, wherein the conditional likelihood is maximized using an application of the rational function growth transform.
  - 5. The method of claim 2, wherein the word string is a text string of training material.
  - 6. The method of claim 5, wherein the word string is a text string derived from a training sentence.
  - 7. The method of claim 5, wherein the word string is a text string derived from a speech utterance.
  - 8. The method of claim 2, wherein adjusting the sets of parameters jointly comprising adjusting the sets of parameters to discriminate between a correct class and incorrect classes for a given training input.
  - 9. The method of claim 2, wherein adjusting the sets of parameters further comprises training the sets of parameters to accommodate unseen data.
  - 10. The method of claim 9, wherein accommodating unseen data comprises adjusting the parameters based on smoothing.

11. A computer-implemented method of estimating a set of parameters for each of a plurality of n-gram language models, the method comprising:
- producing at least two of the sets of parameters jointly in relation to one another.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The method of claim 11, wherein producing at least two sets jointly in relation to one another comprises producing all sets jointly in relation to one another.
  - 13. The method of claim 11, wherein each of the plurality of n-gram language models is associated with a class, and wherein producing at least two of the sets of parameters jointly comprises producing at least two of the sets of parameters such that there is an increase in likelihood that a first particular class will be associated with a particular word string, and such that there is a decrease in likelihood that a second class will be associated with the particular word string.
  - 14. The method of claim 13, wherein the particular word string is derived from a training sentence.
  - 15. The method of claim 13, wherein the particular word string is derived from a speech utterance.
  - 16. The method of claim 11, wherein producing at least two of the sets of parameters jointly comprises producing multiple sets of parameters jointly such that n-gram language models will discriminate between a correct class and incorrect classes for a given training input.
  - 17. The method of claim 11, further comprising training the sets of parameters to accommodate unseen data.

18. A computer implemented method of classifying a natural language input, comprising:
- training a plurality of statistical classification components, which correspond to a plurality of classes, jointly in relation to one another to increase a conditional likelihood of a class given a word string, the plurality of statistical classification components being n-gram language model classifiers;
  
  receiving a natural language input;
  
  applying the plurality of statistical classification components to the natural language input to classify the natural language input into one of the plurality of classes.
- View Dependent Claims (19, 20, 21, 22, 23)
- - 19. The method of claim 18, wherein the conditional likelihood is maximized.
  - 20. The method of claim 19, wherein the conditional likelihood is maximized using an application of the rational function growth transform.
  - 21. The method of claim 20, wherein training the plurality of statistical classification components comprises:
    - identifying an optimal number of rational function growth transform iterations and an optimal CML weight β
      
      _maxto facilitate application of the rational function growth transform.
  - 22. The method of claim 21, wherein identifying the optimal number of rational function growth transform iterations and the optimal CML weight β
    - _maxcomprises;
      
      splitting a collection of training data into a collection of main data and a collection of held-out data;
      
      using the main data to estimate a series of relative frequencies for the statistical classification components; and
      
      using the held-out data to tune the optimal number of rational function growth transform iterations and the optimal CML weight β
      
      _max.
  - 23. The method of claim 22, wherein using the held-out data to tune comprises:
    - fixing a preset number N of rational function growth transform iterations to be run;
      
      fixing a range of values to be explored for determining the optimal CML weight β
      
      _max;
      
      for each value β
      
      _max, running as many rational function growth transform functions as possible up to N such that the conditional likelihood of the main data increases at each iteration; and
      
      identifying as optimal the number of rational function growth transform iterations; and
      
      the β
      
      _maxvalue that maximizes the conditional likelihood of the held-out data.

24. The method of 23, wherein training the plurality of statistical classification components further comprises:
- pooling the main and held-out data to form a combined collection of training data; and
  
  training the plurality of statistical classification components on the combined collection of training data using the optimal number of rational function growth transform iterations and the optimal CML weight β
  
  _max.
- View Dependent Claims (25)
- - 25. The method of claim 24, wherein the plurality of statistical classification components are n-gram language models.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Chelba, Ciprian, Acero, Alejandro, Mahajan, Milind

Granted Patent

US 7,379,867 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/4
CPC Class Codes

G06F 40/216   using statistical methods

G06F 40/44   Statistical methods, e.g. p...

G10L 15/183   using context dependencies,...

G10L 15/197   Probabilistic grammars, e.g...

Discriminative training of language models for text and speech classification

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

80 Citations

25 Claims

Specification

Use Cases

Quick Links

Others

Discriminative training of language models for text and speech classification

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

80 Citations

25 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others