×

Building scalable n-gram language models using maximum likelihood maximum entropy n-gram models

  • US 5,640,487 A
  • Filed: 06/07/1995
  • Issued: 06/17/1997
  • Est. Priority Date: 02/26/1993
  • Status: Expired due to Term
First Claim
Patent Images

1. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform, in a computer based language modelling system receiving data in the form of a series of n-grams, each n-gram comprising a series of "n" words (w1, w2, . . . , wn), each n-gram having an associated count, method steps for classifying the n-grams into non-redundant classes, said method steps comprising:

  • (a) comparing the count of each n-gram to a first threshold value and classifying each n-gram with a count greater than said first threshold in a first class;

    (b) associating all n-grams not classified in step (a) with a putative (n-1)-gram class, each said putative (n-1)-gram class having the same last "n-1" words (w2, w3, . . . ,wn);

    (c) establishing a complement count for each said putative (n-1)-gram class by summing the counts of each n-gram in said putative (n-1)-gram class; and

    (d) comparing said complement count of each said putative (n-1)-gram class to a second threshold value and classifying each said putative (n-1)-gram class with a count greater than said second threshold in a second class.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×