Discriminative training of language models for text and speech classification

US 7,379,867 B2
Filed: 06/03/2003
Issued: 05/27/2008
Est. Priority Date: 06/03/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A computer implemented method of classifying a natural language input, comprising:

training a plurality of statistical classification components jointly in relation to one another to maximize a conditional likelihood of a class given a word string using an application of the rational growth transform, the plurality of statistical classification components being n-gram language model classifiers that each correspond to a different class, wherein each class corresponds to a different category of subject matter, and wherein training the plurality of statistical classification components comprises;

identifying an optimal number of rational function growth transform iterations and an optimal conditional maximum likelihood (CML) weight β

_maxto facilitate application of the rational function growth transform, wherein identifying comprises;

splitting a collection of training data into a collection of main data and a collection of held-out data;

using the main data to estimate a series of relative frequencies for the statistical classification components; and

using the held-out data to tune the optimal number of rational function growth transform iterations and the optimal CML weight β

_max;

receiving a natural language input;

applying the plurality of statistical classification components to the natural language input so as to classify the natural language input into a particular one or more of the plurality of classes that represent the category or categories of subject matter that is best correlated to the natural language input;

wherein using the held-out data to tune comprises;

fixing a preset number N of rational function growth transform iterations to be run;

fixing a range of values to be explored for determining the optimal CML weight β

_max;

for each value β

_max, running as many rational function growth transform functions as possible up to N such that the conditional likelihood of the main data increases at each iteration; and

identifying as optimal the number of rational function growth transforms iterations; and

the β

_maxvalue that maximizes the conditional likelihood of the held-out data; and

wherein training the plurality of statistical classification components further comprises;

pooling the main and held-out data to form a combined collection of training data; and

training the plurality of statistical classification components on the combined collection of training data using the optimal number of rational function growth transform iterations and the optimal CML weight β

_max.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.

35 Citations

View as Search Results

13 Claims

1. A computer implemented method of classifying a natural language input, comprising:
- training a plurality of statistical classification components jointly in relation to one another to maximize a conditional likelihood of a class given a word string using an application of the rational growth transform, the plurality of statistical classification components being n-gram language model classifiers that each correspond to a different class, wherein each class corresponds to a different category of subject matter, and wherein training the plurality of statistical classification components comprises;
  
  identifying an optimal number of rational function growth transform iterations and an optimal conditional maximum likelihood (CML) weight β
  
  _maxto facilitate application of the rational function growth transform, wherein identifying comprises;
  
  splitting a collection of training data into a collection of main data and a collection of held-out data;
  
  using the main data to estimate a series of relative frequencies for the statistical classification components; and
  
  using the held-out data to tune the optimal number of rational function growth transform iterations and the optimal CML weight β
  
  _max;
  
  receiving a natural language input;
  
  applying the plurality of statistical classification components to the natural language input so as to classify the natural language input into a particular one or more of the plurality of classes that represent the category or categories of subject matter that is best correlated to the natural language input;
  
  wherein using the held-out data to tune comprises;
  
  fixing a preset number N of rational function growth transform iterations to be run;
  
  fixing a range of values to be explored for determining the optimal CML weight β
  
  _max;
  
  for each value β
  
  _max, running as many rational function growth transform functions as possible up to N such that the conditional likelihood of the main data increases at each iteration; and
  
  identifying as optimal the number of rational function growth transforms iterations; and
  
  the β
  
  _maxvalue that maximizes the conditional likelihood of the held-out data; and
  
  wherein training the plurality of statistical classification components further comprises;
  
  pooling the main and held-out data to form a combined collection of training data; and
  
  training the plurality of statistical classification components on the combined collection of training data using the optimal number of rational function growth transform iterations and the optimal CML weight β
  
  _max.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein the plurality of statistical classification components are n-gram language models.
  - 3. The method of claim 1, wherein the natural language input is text that is generated by a speech recognition engine.
  - 4. The method of claim 1, wherein the natural language input is a textual representation of a recognized utterance.
  - 5. The method of claim 1, wherein the natural language input is text generated as a textual representation of a handwritten input.
  - 6. The method of claim 1, further comprising preprocessing the natural language input.
  - 7. The method of claim 6, wherein preprocessing comprises removing an article of speech.
  - 8. The method of claim 6, wherein preprocessing comprises transforming the natural language input into an indirect representation.
  - 9. The method of claim 8, wherein transforming comprises transforming into a vector representation.
  - 10. The method of claim 6, wherein preprocessing comprises reducing the natural language input to a select set of extracted features.
  - 11. The method of claim 6, wherein preprocessing comprises reducing the natural language input to only include words included with a predefined vocabulary.
  - 12. The method of claim 6, wherein preprocessing comprises adding an indication of the presence of absence of words previously determined to carry a predefined type of content.
  - 13. The method of claim 6, wherein preprocessing comprises stemming.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Chelba, Ciprian, Acero, Alejandro, Mahajan, Milind
Primary Examiner(s)
Dorvil; Richemond
Assistant Examiner(s)
Siedler; Dorothy S

Application Number

US10/453,349
Publication Number

US 20040249628A1
Time in Patent Office

1,820 Days
Field of Search

704/1, 704/225
US Class Current

704/236
CPC Class Codes

G06F 40/216   using statistical methods

G06F 40/44   Statistical methods, e.g. p...

G10L 15/183   using context dependencies,...

G10L 15/197   Probabilistic grammars, e.g...

Discriminative training of language models for text and speech classification

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

35 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Discriminative training of language models for text and speech classification

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

35 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links