Apparatus and method for estimating, from sparse data, the probability that a particular one of a set of events is the next event in a string of events

US 4,831,550 A
Filed: 03/27/1986
Issued: 05/16/1989
Est. Priority Date: 03/27/1986
Status: Expired due to Fees

First Claim

Patent Images

1. In a speech recognition system, a computer-implemented method of evaluating the likelihood of a word from a vocabulary of words occurring next after a string of known words, based on counts of word sequences occurring in a sample text which is sparse relative to possible word sequences, the method comprising the steps of:

(a) characterizing word sequences as m-grams, each m-gram occurring in the sample text representing a key of words followed by a word;

(b) storring a discounted probability P for each of at least some m-grams occurring in the sample text;

(c) generating a freed probability mass value β

_L for each key occurring in the sample text, the β

_L for a key of length L being allocated to those m-grams which (i) include the subject key and (ii) have no respective discounted probabilites stored therefor;

(d) generating γ

_L factors, each γ

_L factor being valued to normalize the probability distribution of only those m-grams which (i) are formed from a key of length L and (ii) are not included in a greater-included m-gram having a key of known words;

(e) storing for each key of length L, a value α

_L =β

Lγ

L and(f) evaluating a likelihood of a selected word following a string of known words including the steps of;

(i) searching successively shorter keys of the known words until a key is found which, when followed by the at least one selected word, represents an m-gram having a discounted probability P;

stored therefor, and retrieving P;

(ii) retrieving the stored α

_L value for each longer key searched before the stored m-gram is found; and

(iii) computing a likelihood value of the selected word following the string of known words based on the retrieved α

_L values and the retrieved P value.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Apparatus and method for evaluating the likelihood of an event (such as a word) following a string of known events, based on event sequence counts derived from sparse sample data. Event sequences--or m-grams--include a key and a subsequent event. For each m-gram is stored a discounted probability generated by applying modified Turing'"'"'s estimate, for example, to a count-based probability. For a key occurring in the sample data there is stored a normalization constant which preferably (a) adjusts the discounted probabilities for multiple counting, if any, and (b) includes a freed probability mass allocated to m-grams which do not occur in the sample data. To determine the likelihood of a selected event following a string of known events, a "backing off" scheme is employed in which successively shorter keys (of known events) followed by the selected event (representing m-grams) are searched until an m-gram is found having a discounted probability stored therefor. The normalization constants of the longer searched keys--for which the corresponding m-grams have no stored discounted probability--are combined together with the found discounted probability to produce the likelihood of the selected event being next.

Citations

14 Claims

1. In a speech recognition system, a computer-implemented method of evaluating the likelihood of a word from a vocabulary of words occurring next after a string of known words, based on counts of word sequences occurring in a sample text which is sparse relative to possible word sequences, the method comprising the steps of:
- (a) characterizing word sequences as m-grams, each m-gram occurring in the sample text representing a key of words followed by a word;
  
  (b) storring a discounted probability P for each of at least some m-grams occurring in the sample text;
  
  (c) generating a freed probability mass value β
  
  _L for each key occurring in the sample text, the β
  
  _L for a key of length L being allocated to those m-grams which (i) include the subject key and (ii) have no respective discounted probabilites stored therefor;
  
  (d) generating γ
  
  _L factors, each γ
  
  _L factor being valued to normalize the probability distribution of only those m-grams which (i) are formed from a key of length L and (ii) are not included in a greater-included m-gram having a key of known words;
  
  (e) storing for each key of length L, a value α
  
  _L =β
  
  Lγ
  
  L and(f) evaluating a likelihood of a selected word following a string of known words including the steps of;
  
  (i) searching successively shorter keys of the known words until a key is found which, when followed by the at least one selected word, represents an m-gram having a discounted probability P;
  
  stored therefor, and retrieving P;
  
  (ii) retrieving the stored α
  
  _L value for each longer key searched before the stored m-gram is found; and
  
  (iii) computing a likelihood value of the selected word following the string of known words based on the retrieved α
  
  _L values and the retrieved P value.
- View Dependent Claims (2)
- - 2. The method of claim 1 comprising the further step of, prior to storing the discounted probability P:
    - forming the discounted probability P for an m-gram, which includes the steps of;
      
      (i) assigning a count-based probability to each m-gram occurring in the sample data based on a frequency of occurrence; and
      
      (ii) applying modified Turing'"'"'s estimate to each of at least some of the assigned probabilities to generate a respective discounted probability therefor.

3. In a speech recognition system, apparatus for estimating probability distributions of word sequences of length j from sparse sample data in which only some of the possible word sequences of length j occur in the sample text, the apparatus comprising:
- (a) means for storing discounted probabilities, where each discounted probability replaces a count-based probability for an m-gram of less than or equal to j words in a sequence;
  
  (b) means for storing a normalization constant α
  
  for each of at least some keys occurring in the sample text, wherein a key represents a sequence of less than j words and wherein the normalization constant for a key is based on (i) a freed probability mass which represents the difference from one of the sum of discounted probabilities of m-grams formed of the key and a subsequent word and (ii) a factor which avoids multiple counting among m-grams; and
  
  (c) processing means, having a sequence of known words and a selected word as input, for evaluating the likelihood of the word sequence including the selected word following (j-1) known words, said processing means including;
  
  means for retrieving a discounted probability for the longest m-gram that has a discounted probability stored therefor and that includes a key of (j-k) known words (where k is a positive integer less than j) followed by the selected word;
  
  means for retrieving the normalization constant for each key of length greater than (j-k) which includes the known words; and
  
  means for multiplying the retrieved normalization constants and the retrieved discounted probability;
  
  the product of the multiplying means indicating the likelihood of the word sequence indicating the selected word followed by the (j-1) known words.
- View Dependent Claims (4)
- - 4. The apparatus of claim 3 further comprising:
    - means for choosing selected words from a vocabulary of words and evaluating a respective likelihood for each chosen word.

5. In a symbol recognition system, a computer-implemented method of evaluating the likelihood of a symbol from a vacabulary of symbols occurring next after a string of known symbols, based on counts of symbol sequences occurring in a sample text which is sparse relative to possible symbol sequences, the method comprising the steps of:
- (a) characterizing symbol sequences as m-grams, each m-gram occurring in the sample text representing a key of symbols followed by a symbol;
  
  (b) storing a discounted probability P for each of at least some m-grams occurring in the sample text;
  
  (c) generating a freed probability mass value β
  
  _L for each key occurring in the sample text, the β
  
  _L for a key of length L being allocated to those m-grams which (i) include the key and (ii) have no respective discounted probabilities stored therefor;
  
  (d) generating γ
  
  _L factors, each γ
  
  _L factor being valued to normalize the probability distribution of only those m-grams which (i) are formed from a key of length L and (ii) are not included in a greater-included m-gram having a key of known symbols;
  
  (e) storing for each key of length L, a value α
  
  _L =β
  
  _Lγ
  
  L and(f) evaluating a likelihood of a selected symbol following a string of known symbols including the steps of;
  
  (i) searching successively shorter kyes of the known symbols until a key is found which, when followed by the at least one selected symbol, represents an m-gram having a discounted probability P stored therefor, and retrieving P;
  
  (ii) retrieving the stored α
  
  _L value for each longer key searched before the stored m-gram is found; and
  
  (iii) comprising a likelihood value of the selected symbol following the string of known symbols based on the retrieved α
  
  _L values and the retrieved P value.
- View Dependent Claims (6, 7, 8, 9)
- - 6. The method of claim 5 comprising the further step of, prior to storing the discounted probability P:
    - forming the discounted probability P for an m-gram., which includes the steps of;
      
      (i) assigning a count-based probability to each m-gram occurring in the ssample data based on frequency of occurrence; and
      
      (ii) applying modified Turing;
      
      3 s estimate to each of at least some of the assigned probabilities to generate a respective discounted probability therefor.
  - 7. A method as claimed in claim 5, characterized in that the symbols are optical symbols.
  - 8. A method as claimed in claim 5, characterized in that the symbols are characters.
  - 9. A method as claimed in claim 5, characterized in that the symbols are words.

10. In a speech recognition system, apparatus for estimating probability distributions of symbol sequences of length j from sparse sample data in which only some of the possible symbol sequences of length j occur in the sample text, the apparatus comprising:
- (a) means for storing discounted probabilities, wherein each discounted probability replaces a count-based probability for an m-gram of less than or equal to j symbols in a sequence;
  
  (b) means for storing a normalization constant α
  
  for each of at least some keys occurring in the sample text, wherein a key represents a sequence of less than j symbols and wherein the normalization constant of a key is based on (i) a freed probability mass which represents the difference from one of the sum of discounted probabilities of m-grams formed of the key and a subsequent symbol and (ii) a factor which avoids multiple counting among m-grams; and
  
  (c) processing means, having a sequence of known symbols and a selected symbol as input, for evaluating the likelihood of the symbol sequence including the selected symbol following (j-1) known symbols, said processing means including;
  
  means for retrieving a discounted probability for the longest m-gram that has a disconnected probability stored theefor and that includes a key of (j-k) known symbols (where k is a positive integer less than j) followed by the selected symbol;
  
  means for retrieving the normalization constant for each key of length greater than (j-k) which includes the known symbols, andmeans for multiplying the retrieved normalization constants and the retrieved discounted probability;
  
  the product of the multiplying means indicating the likelihood of the symbol sequence including the selected symbol followed by the (j-1) known symbols.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The apparatus of claim 10 further comprising:
    - means for choosing selected symbols from a vocabulary of symbols and evaluating a respective likelihood for each chosen symbol.
  - 12. An apparatus as claimed in claim 10, characterized in that the symbols are optical symbols.
  - 13. An apparatus as claimed in claim 12, characterized in that the symbols are characters.
  - 14. An apparatus as claimed in claim 12, characterized in that the symbols are words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Katz, Slava M.
Primary Examiner(s)
NOT, DEFINED
Assistant Examiner(s)
Lynt, Christopher H.

Application Number

US06/844,904
Time in Patent Office

1,146 Days
Field of Search

381/43, 364/513.5
US Class Current

704/240
CPC Class Codes

G06V 30/268   Lexical context

G10L 15/00   Speech recognition G10L17/0...

G10L 15/14   using statistical models, e...

Apparatus and method for estimating, from sparse data, the probability that a particular one of a set of events is the next event in a string of events

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and method for estimating, from sparse data, the probability that a particular one of a set of events is the next event in a string of events

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links