×

System and method for providing lossless compression of n-gram language models in a real-time decoder

  • US 6,092,038 A
  • Filed: 02/05/1998
  • Issued: 07/18/2000
  • Est. Priority Date: 02/05/1998
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for losslessly compressing an n-gram language model for storage in a storage device, the n-gram language model comprising a plurality of n-gram records generated from a training vocabulary, each n-gram record comprising an n-gram in the form of a series of "n-tuple" words (w1, w2, . . . wn), a count and a probability associated therewith, each n-gram having a history represented by the initial n-1 words of the n-gram, said method comprising the steps of:

  • splitting said plurality of n-gram records into (i) a set of common history records comprising subsets of n-tuple words having a common history and (ii) sets of hypothesis records that are associated with the common history records, each set of hypothesis records including at least one hypothesis record comprising a word record-probability record pair;

    partitioning said common history records into at least a first group and a second group, said first group comprising each common history record having a single hypothesis record associated therewith, said second group comprising each common history record having more than one hypothesis record associated therewith;

    storing said hypothesis records associated with said second group of common history records in said storage device; and

    storing, in an index portion of said storage device, (i) each common history record of said second group together with an address that points to a location in said storage device having corresponding hypothesis records and (ii) each common history record of said first group together with its corresponding single hypothesis record.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×