×

Word classing for language modeling

  • US 9,367,526 B1
  • Filed: 07/26/2011
  • Issued: 06/14/2016
  • Est. Priority Date: 07/26/2011
  • Status: Active Grant
First Claim
Patent Images

1. In a language model employing a classing function defining classes of words, each of the classes grouping words sharing a similar likelihood of appearing in a production application context, a method of optimizing the classing function comprising:

  • identifying a language context corresponding to a production application, the language context based on usage encountered by a language model invoked by the production application;

    defining a training corpus having a set of clusters indicative of expected usage, the clusters being n-grams having a sequence of n words for defining a probability that the first n−

    1 words in the sequence is followed by word n in the sequence; and

    building a language model from a classing function applied to the training corpus, the classing function optimized to correspond to usage in the identified language context using class-based and word-based features by computing a likelihood of a word in an n-gram and a frequency of a word within a class of the n-gram, optimizing the classing function further comprising;

    employing a word based classing approach;

    backing off, if the word based approach indicates a null probability; and

    employing a class based approach;

    further comprising;

    determining seen and unseen clusters, the unseen clusters having a previously unoccurring sequence of words;

    employing the word based classification if the cluster has a previous occurrence,identifying a discount parameter, the discount parameter reducing a count of word occurrences of a particular cluster in favor of a class count of words of the cluster;

    backing off using the discount parameter and employing a class based approach if the cluster is unseen, unseen clusters based on occurrence of any of the words in the cluster, the unseen cluster having a nonzero probability if any word in the class of words has occurred;

    the discount parameter reducing a count of word occurrences of a particular cluster in favor of a class count of words of the cluster; and

    the discount parameter defining an absolute discounting model, further comprising;

    identify a discount parameter indicative of a reduction of a word count of words in a cluster;

    determining if the cluster is to be pruned or retained in the corpus;

    subtracting the discount parameter from a maximum count of the observed word based count of the cluster to compute a discounted count;

    ordefining the discount count of the cluster as zero if the cluster is pruned.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×