DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR TEXT AND SPEECH CLASSIFICATION

US 20080215311A1
Filed: 04/15/2008
Published: 09/04/2008
Est. Priority Date: 06/03/2003
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

estimating a set of parameters for a plurality of n-gram language models that each correspond to a different class, wherein each class corresponds to a different category of subject matter, and wherein estimating comprises;

setting initial values for each n-gram language model'"'"'s sets of parameters; and

adjusting each n-gram language model'"'"'s sets of parameters jointly in relation to one another to increase a conditional likelihood of a class corresponding to a category of subject matter given a word string; and

utilizing the n-gram language models'"'"' sets of parameters as a basis for supporting a determination as to which of the plurality of classes represents the category of subject matter that is best correlated to a given natural language input.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.

29 Citations

View as Search Results

17 Claims

1. A computer-implemented method, comprising:
- estimating a set of parameters for a plurality of n-gram language models that each correspond to a different class, wherein each class corresponds to a different category of subject matter, and wherein estimating comprises;
  
  setting initial values for each n-gram language model'"'"'s sets of parameters; and
  
  adjusting each n-gram language model'"'"'s sets of parameters jointly in relation to one another to increase a conditional likelihood of a class corresponding to a category of subject matter given a word string; and
  
  utilizing the n-gram language models'"'"' sets of parameters as a basis for supporting a determination as to which of the plurality of classes represents the category of subject matter that is best correlated to a given natural language input.
- View Dependent Claims (2)
- - 2. The method of claim 1, wherein adjusting each n-gram language model'"'"'s set of parameters comprises adjusting a probability value associating a word string to a class.

11. A computer-implemented classification method that includes a method for estimating a set of parameters for each of a plurality of n-gram language models, each n-gram language model being associated with a different category of subject matter, the computer-implemented classification method comprising:
- producing the sets of parameters for at least two of the n-gram language models jointly in relation to one another;
  
  utilizing the sets of parameters of the plurality of n-gram language models that are produced at least partially jointly in relation to one another as a basis for supporting a determination as to which of the categories of subject matter is best correlated to a given natural language input, wherein the category of subject matter that is best correlated to the given natural language input is information other than a textual representation of the given natural language input itself.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17)
- - 3. The method of claim 11 wherein the conditional likelihood is maximized.
  - 4. The method of claim 3, wherein the conditional likelihood is maximized using an application of the rational function growth transform.
  - 5. The method of claim 11 wherein the word string is a text string of training material.
  - 6. The method of claim 5, wherein the word string is a text string derived from a training sentence.
  - 7. The method of claim 5, wherein the word string is a text string derived from a speech utterance.
  - 8. The method of claim 11 wherein adjusting each n-gram language model'"'"'s sets of parameters jointly comprises adjusting the sets of parameters to discriminate between a correct class and incorrect classes for a given training input.
  - 9. The method of claim 11 wherein adjusting each n-gram language model'"'"'s sets of parameters further comprises training the sets of parameters to accommodate unseen data.
  - 10. The method of claim 9, wherein accommodating unseen data comprises adjusting the parameters based on smoothing.
  - 12. The method of claim 11, wherein producing the sets for at least two of the n-gram language models jointly in relation to one another comprises producing all sets of parameters of the plurality of n-gram language models jointly in relation to one another.
  - 13. The method of claim 11, wherein each of the plurality of n-gram language models is associated with a class, and wherein producing the sets of parameters for at least two of the n-gram language models jointly comprises producing at least two of the sets of parameters such that a first particular class will have a higher probability value associating it to a particular word string, and such that a second class will have a lower probability value associating it to the particular word string.
  - 14. The method of claim 13, wherein the particular word string is derived from a training sentence.
  - 15. The method of claim 13, wherein the particular word string is derived from a speech utterance.
  - 16. The method of claim 11, wherein producing the sets of parameters for at least two of the n-gram language models jointly comprises producing multiple sets of parameters jointly such that n-gram language models will discriminate between a correct class and incorrect classes for a given training input.
  - 17. The method of claim 11, further comprising training the sets of parameters to accommodate unseen data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Chelba, Ciprian, Acero, Alejandro, Mahajan, Milind

Granted Patent

US 8,306,818 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 40/216   using statistical methods

G06F 40/44   Statistical methods, e.g. p...

G10L 15/183   using context dependencies,...

G10L 15/197   Probabilistic grammars, e.g...

DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR TEXT AND SPEECH CLASSIFICATION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

29 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR TEXT AND SPEECH CLASSIFICATION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

29 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links