Method and apparatus for constructing and using syllable-like unit language models

  • US 7,676,365 B2
  • Filed: 04/20/2005
  • Issued: 03/09/2010
  • Est. Priority Date: 12/26/2000
  • Status: Active Grant
  • ×
    • Pin Icon | RPX Insight
    • Pin
First Claim
Patent Images

1. A speech recognition system having a language model generated through a process comprising:

  • a processor breaking each word in a dictionary into units wherein breaking each word into units comprises;

    breaking each word in the dictionary into initial units by dividing each word into the largest units possible that each include at most one vowel sound;

    for each initial unit, setting a frequency for the initial unit by summing the unigram probabilities of the words in which the initial unit was identified;

    breaking at least one of the initial units into smaller units by preferring smaller units that occur more frequently in the dictionary over smaller units that occur less frequently and by preferring smaller units that group together sequences of phonetic units that appear in a word and where each sequence of phonetic units comprises phonetic units that the speech recognition system typically fails to recognize individually;

    for each word, grouping the smaller units of the word into n-grams;

    counting the total number of n-gram occurences in the dictionary; and

    for each n-gram, counting the number of occurences of the n-gram in the dictionary and dividing this count by the total number of n-gram occurences to form a language model probability for the n-gram.

View all claims

    Thank you for your feedback