Building scalable n-gram language models using maximum likelihood maximum entropy n-gram models

US 5,640,487 A
Filed: 06/07/1995
Issued: 06/17/1997
Est. Priority Date: 02/26/1993
Status: Expired due to Term

First Claim

Patent Images

1. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform, in a computer based language modelling system receiving data in the form of a series of n-grams, each n-gram comprising a series of "n" words (w₁, w₂, . . . , w_n), each n-gram having an associated count, method steps for classifying the n-grams into non-redundant classes, said method steps comprising:

(a) comparing the count of each n-gram to a first threshold value and classifying each n-gram with a count greater than said first threshold in a first class;

(b) associating all n-grams not classified in step (a) with a putative (n-1)-gram class, each said putative (n-1)-gram class having the same last "n-1" words (w₂, w₃, . . . ,w_n);

(c) establishing a complement count for each said putative (n-1)-gram class by summing the counts of each n-gram in said putative (n-1)-gram class; and

(d) comparing said complement count of each said putative (n-1)-gram class to a second threshold value and classifying each said putative (n-1)-gram class with a count greater than said second threshold in a second class.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention is an n-gram language modeler which significantly reduces the memory storage requirement and convergence time for language modelling systems and methods. The present invention aligns each n-gram with one of "n" number of non-intersecting classes. A count is determined for each n-gram representing the number of times each n-gram occurred in the training data. The n-grams are separated into classes and complement counts are determined. Using these counts and complement counts factors are determined, one factor for each class, using an iterative scaling algorithm. The language model probability, i.e., the probability that a word occurs given the occurrence of the previous two words, is determined using these factors.

Citations

10 Claims

1. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform, in a computer based language modelling system receiving data in the form of a series of n-grams, each n-gram comprising a series of "n" words (w₁, w₂, . . . , w_n), each n-gram having an associated count, method steps for classifying the n-grams into non-redundant classes, said method steps comprising:
- (a) comparing the count of each n-gram to a first threshold value and classifying each n-gram with a count greater than said first threshold in a first class;
  
  (b) associating all n-grams not classified in step (a) with a putative (n-1)-gram class, each said putative (n-1)-gram class having the same last "n-1" words (w₂, w₃, . . . ,w_n);
  
  (c) establishing a complement count for each said putative (n-1)-gram class by summing the counts of each n-gram in said putative (n-1)-gram class; and
  
  (d) comparing said complement count of each said putative (n-1)-gram class to a second threshold value and classifying each said putative (n-1)-gram class with a count greater than said second threshold in a second class.
- View Dependent Claims (2, 3)
- - 2. The program storage device of claim 1, wherein steps (a) to (d) are repeated until only unclassified putative 1-grams are left.
  - 3. The program storage device of claim 2, further including the performance of the step of grouping all said unclassified putative 1-grams into a default class.

4. A computer program product, comprising:
- a computer usable medium having computer readable program code means embodied in said medium for classifying, in a computer based language modelling system receiving data in the form of a series of n-grams, each n-gram comprising a series of "n" words (w₁, w₂, . . . ,w_n), each n-gram having an associated count, the n-grams into non-redundant classes, said computer readable program code means comprising;
  
  computer readable program code means for causing a computer to effect a comparison of the count of each n-gram to a first threshold value and classifying each n-gram with a count greater than said first threshold in a first class;
  
  computer readable program code means for causing a computer to effect an association of all n-grams not classified in step (a) with a putative (n-1)-gram class, each said putative (n-1)-gram class having the same last "n-1" words (w₂, w₃, . . . ,w_n);
  
  computer readable program code means for causing a computer to effect an establishment of a complement count for each said putative (n-1-gram class by summing the counts of each n-gram in said putative (n-1)-gram class; and
  
  computer readable program code means for causing a computer to effect a comparison of said complement count of each said putative (n-1)-gram class to a second threshold value and classifying each said putative (n-1)-gram class with a count greater than said second threshold in a second class.

5. A computer program product, comprising:
- a computer usable medium having computer readable program code means embodied in said medium for determining a conditional probability of a predicted word given the previous (n-1) words, wherein an n-gram defines a series of "n" words, each n-gram having an associated count, and the history of an n-gram being represented by the initial n-1 words of the n-gram, said computer readable program code means comprising;
  
  computer readable first program code means for causing the computer to effect an examination of each word within each n-gram and classifying each n-gram into one of a plurality of non-redundant classes;
  
  computer readable second program code means for causing the computer to effect a determination of a factor for each of said plurality of non-redundant classes, said factor representing the relative strength of predicting said predicted word given the previous (n-1) words; and
  
  computer readable third program code means for causing the computer to effect a determination of said conditional probability of the occurrence of said predicted word given that a particular sequence of (n-1) previous words have occurred using said factors.
- View Dependent Claims (6)
- - 6. The computer program product of claim 5, wherein said computer readable first program code means comprises:
    - computer readable program code means for causing a computer to effect a comparison of the count of each n-gram to a first threshold value and classifying each n-gram with a count greater than said first threshold in a first class;
      
      computer readable program code means for causing a computer to effect an association of all n-grams not classified in step (a) with a putative (n-1)-gram class, each said putative (n-1)-gram class having the same last "n-1" words (w₂, w₃, . . . , w_n);
      
      computer readable program code means for causing a computer to effect an establishment of a complement count for each said putative (n-1)-gram class by summing the counts of each n-gram in said putative (n-1)-gram class; and
      
      computer readable program code means for causing a computer to effect a comparison of said complement count of each said putative (n-1)-gram class to a second threshold value and classifying each said putative (n-1)-gram class with a count greater than said second threshold in a second class.

7. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform, in a computer based language modelling system receiving data in the form of a series of n-grams, each n-gram comprising a series of "n" words (w₁, w₂, . . . ,w_n), each n-gram having an associated count, method steps for determining a conditional probability of a predicted word given the previous (n-1) words, said method steps comprising:
- (1) examining each word within each n-gram and classifying each n-gram into one of a plurality of non-redundant classes;
  
  (2) determining a factor for each of said plurality of non-redundant classes, said factor representing the relative strength of predicting said predicted word given the previous (n-1) words; and
  
  (3) determining the conditional probability of the occurrence of said predicted word given that a particular sequence of (n-1) previous words have occurred using said factors.
- View Dependent Claims (8, 9, 10)
- - 8. The program storage device of claim 7, wherein said step of examining further comprises the steps of:
    - (a) comparing the count of each n-gram to a first threshold value and classifying each n-gram with a count greater than said first threshold in a first class;
      
      (b) associating all n-grams not classified in step (a) with a putative (n-1)-gram class, each said putative (n-1)-gram class having the same last "n-1" words (w₂, w₃, . . . ,w_n);
      
      (c) establishing a complement count for each said putative (n-1)-gram class by summing the counts of each n-gram in said putative (n-1)-gram class; and
      
      (d) comparing said complement count of each said putative (n-1)-gram class to a second threshold value and classifying each said putative (n-1)-gram class with a count greater than said second threshold in a second class.
  - 9. The program storage device of claim 8, wherein steps (a) to (d) are repeated until only unclassified putative 1-grams are left.
  - 10. The program storage device of claim 9, further including the step (e) of grouping all said unclassified putative 1-grams into a default class.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Lau, Raymond, Rosenfeld, Ronald, Roukos, Salim
Primary Examiner(s)
Sheikh, Ayaz R.
Assistant Examiner(s)
EDOUARD, PATRICK NESTOR

Application Number

US08/487,299
Time in Patent Office

741 Days
Field of Search

381/41-46, 395/2.4, 395/2.49, 395/2.45, 395/2.52-2.54, 395/2.59, 395/2.64-2.66
US Class Current

704/243
CPC Class Codes

G10L 15/183 using context dependencies,...

G10L 15/197 Probabilistic grammars, e.g...

Building scalable n-gram language models using maximum likelihood maximum entropy n-gram models

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Building scalable n-gram language models using maximum likelihood maximum entropy n-gram models

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links