×

Apparatus and method for estimating, from sparse data, the probability that a particular one of a set of events is the next event in a string of events

  • US 4,831,550 A
  • Filed: 03/27/1986
  • Issued: 05/16/1989
  • Est. Priority Date: 03/27/1986
  • Status: Expired due to Fees
First Claim
Patent Images

1. In a speech recognition system, a computer-implemented method of evaluating the likelihood of a word from a vocabulary of words occurring next after a string of known words, based on counts of word sequences occurring in a sample text which is sparse relative to possible word sequences, the method comprising the steps of:

  • (a) characterizing word sequences as m-grams, each m-gram occurring in the sample text representing a key of words followed by a word;

    (b) storring a discounted probability P for each of at least some m-grams occurring in the sample text;

    (c) generating a freed probability mass value β

    L for each key occurring in the sample text, the β

    L for a key of length L being allocated to those m-grams which (i) include the subject key and (ii) have no respective discounted probabilites stored therefor;

    (d) generating γ

    L factors, each γ

    L factor being valued to normalize the probability distribution of only those m-grams which (i) are formed from a key of length L and (ii) are not included in a greater-included m-gram having a key of known words;

    (e) storing for each key of length L, a value α

    L



    L and(f) evaluating a likelihood of a selected word following a string of known words including the steps of;

    (i) searching successively shorter keys of the known words until a key is found which, when followed by the at least one selected word, represents an m-gram having a discounted probability P;

    stored therefor, and retrieving P;

    (ii) retrieving the stored α

    L value for each longer key searched before the stored m-gram is found; and

    (iii) computing a likelihood value of the selected word following the string of known words based on the retrieved α

    L values and the retrieved P value.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×