×

Clustering classes in language modeling

  • US 9,529,898 B2
  • Filed: 03/12/2015
  • Issued: 12/27/2016
  • Est. Priority Date: 08/26/2014
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method, comprising:

  • obtaining, by a computing system, a plurality of text samples that each includes a respective class term that belongs to a same pre-defined class of topically related terms;

    for each respective text sample among the plurality of text samples;

    (i) identifying that the respective class term of the respective text sample belongs to a particular sub-class, among a plurality of sub-classes, of the pre-defined class of topically related terms; and

    (ii) assigning the respective text sample to a particular group of text samples that corresponds to the particular sub-class to which the respective class term of the respective text sample belongs, such that the plurality of text samples are assigned among a plurality of groups of text samples and each respective group of text samples among the plurality of groups of text samples corresponds to a different one of a plurality of sub-classes of the pre-defined class of topically related terms;

    generating, by the computing system and for each respective sub-class among the plurality of sub-classes, a respective sub-class context model that represents probabilities of language sequences determined based on the text samples assigned to the corresponding group of text samples for the respective sub-class;

    merging, by the computing system, particular ones of the sub-class context models that are determined to be similar to generate a hierarchical set of context models;

    selecting, by the computing system, particular ones of the context models from among the hierarchical set of context models;

    generating, by the computing system, a class-based language model that includes, for each of the selected context models, a respective class that corresponds to the respective context model; and

    providing the class-based language model in a speech recognition system and transcribing speech characterized in an audio signal to text using the class-based language model in the speech recognition system.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×