Method and apparatus for distribution-based language model adaptation

US 7,043,422 B2
Filed: 09/04/2001
Issued: 05/09/2006
Est. Priority Date: 10/13/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A method of forming a language model, the method comprising:

selecting out-of-task training data having n-gram distributions;

selecting task-specific training data having n-gram distributions;

modifying an n-gram distribution in the out-of-task training data to form modified training data by applying a weight to an n-gram in the out-of-task training data, the weight formed as;

${(\frac{P_{task - specific}}{P_{out - of - task}})}^{α}$ where P_{task-specific}is the relative frequency of an n-gram in the task-specific training data, P_out-of-taskis the relative frequency of the n-gram in the out-of-task training data, and α

is an adaptation coefficient; and

identifying probabilities for the language model based on the modified training data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.

Citations

10 Claims

1. A method of forming a language model, the method comprising:
- selecting out-of-task training data having n-gram distributions;
  
  selecting task-specific training data having n-gram distributions;
  
  modifying an n-gram distribution in the out-of-task training data to form modified training data by applying a weight to an n-gram in the out-of-task training data, the weight formed as;
  
  ${(\frac{P_{task - specific}}{P_{out - of - task}})}^{α}$ where P_{task-specific}is the relative frequency of an n-gram in the task-specific training data, P_out-of-taskis the relative frequency of the n-gram in the out-of-task training data, and α
  
  is an adaptation coefficient; and
  
  identifying probabilities for the language model based on the modified training data.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1 wherein forming a weight further comprises selecting a value for the adaptation coefficient based in part on a perplexity determined for a set of test data.
  - 3. The method of claim 2 wherein selecting a value for the adaptation coefficient comprises:
    - selecting an initial value for the adaptation coefficient;
      
      modifying the out-of-task training data based on the initial value for the adaptation coefficient;
      
      identifying probabilities based on the modified training data;
      
      using the identified probabilities to determine the perplexity of the set of test data; and
      
      adjusting the adaptation coefficient based on the perplexity.
  - 4. The method of claim 3 wherein adjusting the adaptation coefficient based on the perplexity comprises adjusting the adaptation coefficient to minimize the perplexity.
  - 5. The method of claim 4 wherein adjusting the adaptation coefficient further comprises:
    - partitioning the task-specific training data into multiple sets of data;
      
      identifying a separate adaptation coefficient for each set in the multiple sets of data, the adaptation coefficient being identified for a set by using the set as the test set while using the remaining sets of the task-specific training data to modify the out-of-task training data;
      
      averaging the separate adaptation coefficients to form an average adaptation coefficient; and
      
      using the average adaptation coefficient to determine the weights used to modify the out-of-task training data.

6. A tangible computer-readable medium having computer-executable instructions for forming a language model through steps comprising:
- determining a distribution of entities in a small set of training data;
  
  changing a distribution of entities in a large set of training data based on the distribution of entities in the small set of training data to form a modified distribution of entities by applying a weight to a count of entities in the large set of training data, the weight being a function of;
  
  ${(\frac{P_{small - set}}{P_{large - set}})}^{α}$ where P_small-setis a relative frequency of an entity in the small set of training data, P_large-setis a relative frequency of the entity in the large set of training data, and α
  
  is an adaptation coefficient; and
  
  using the modified distribution of entities to identify probabilities for the language model.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The computer-readable medium of claim 6 wherein the entities comprise n-grams.
  - 8. The computer-readable medium of claim 6 wherein applying a weight further comprises selecting the adaptation coefficient to minimize perplexity measured against a set of test data.
  - 9. The computer-readable medium of claim 8 wherein applying a weight further comprises determining a separate adaptation coefficient for each of a plurality of sets of test data and determining an average adaptation coefficient from the separate adaptation coefficients.
  - 10. The computer-readable medium of claim 9 wherein determining a separate adaptation coefficient comprises:
    - selecting one set of test data from the plurality of sets of test data;
      
      grouping the remaining sets of test data as the small set of training data;
      
      determining the adaptation coefficient to minimize the perplexity against the selected set of test data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Li, Mingjing, Gao, Jianfeng
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
SPOONER, LAMONT M

Application Number

US09/945,930
Publication Number

US 20020188446A1
Time in Patent Office

1,708 Days
Field of Search

704/9, 704/10, 704/231, 704/240, 704/250, 704/257
US Class Current

704/9
CPC Class Codes

G06F 40/216   using statistical methods

G10L 15/065   Adaptation

G10L 15/1815   Semantic context, e.g. disa...

Method and apparatus for distribution-based language model adaptation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for distribution-based language model adaptation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links