×

System and method for joint optimization of language model performance and size

  • US 7,275,029 B1
  • Filed: 06/30/2000
  • Issued: 09/25/2007
  • Est. Priority Date: 11/05/1999
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of using a tuning set of information to jointly optimize the performance and size of a language model, comprising:

  • providing a textual corpus comprising subsets wherein each subset comprises a plurality of items;

    creating a Dynamic Order Markov Model data structure by assigning each item of the plurality of items to a node in the data structure, wherein the nodes are logically coupled to denote dependencies of the items, and calculating a frequency of occurrence for each item of the plurality of items;

    segmenting at least a subset of a received textual corpus into segments by clustering every N-items of the received corpus into a training unit, wherein resultant training units are separated by gaps, and wherein N is an empirically derived value based, at least in part, on the size of the received corpus;

    creating the tuning set from application-specific information;

    (a) training a seed model via the tuning set;

    (b) calculating a similarity within a sequence of the training units on either side of each of the gaps;

    (c) selecting segment boundaries that maximize intra-segment similarity and inter-segment disparity;

    (d) calculating a perplexity value for each segment based on a comparison with the seed model;

    (e) selecting some of the segments based on their respective perplexity values to augment the tuning set;

    iteratively refining the tuning set and the seed model by repeating steps (a) through (e) with respect to a threshold;

    refining the language model based on the seed model;

    generating the language model that is representative of the textual corpus for use by a host of applications; and

    providing recognition of the textual corpus based on the language model.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×