Apparatus and method for estimating, from sparse data, the probability that a particular one of a set of events is the next event in a string of events
First Claim
1. In a speech recognition system, a computer-implemented method of evaluating the likelihood of a word from a vocabulary of words occurring next after a string of known words, based on counts of word sequences occurring in a sample text which is sparse relative to possible word sequences, the method comprising the steps of:
- (a) characterizing word sequences as m-grams, each m-gram occurring in the sample text representing a key of words followed by a word;
(b) storring a discounted probability P for each of at least some m-grams occurring in the sample text;
(c) generating a freed probability mass value β
L for each key occurring in the sample text, the β
L for a key of length L being allocated to those m-grams which (i) include the subject key and (ii) have no respective discounted probabilites stored therefor;
(d) generating γ
L factors, each γ
L factor being valued to normalize the probability distribution of only those m-grams which (i) are formed from a key of length L and (ii) are not included in a greater-included m-gram having a key of known words;
(e) storing for each key of length L, a value α
L =β
Lγ
L and(f) evaluating a likelihood of a selected word following a string of known words including the steps of;
(i) searching successively shorter keys of the known words until a key is found which, when followed by the at least one selected word, represents an m-gram having a discounted probability P;
stored therefor, and retrieving P;
(ii) retrieving the stored α
L value for each longer key searched before the stored m-gram is found; and
(iii) computing a likelihood value of the selected word following the string of known words based on the retrieved α
L values and the retrieved P value.
1 Assignment
0 Petitions
Accused Products
Abstract
Apparatus and method for evaluating the likelihood of an event (such as a word) following a string of known events, based on event sequence counts derived from sparse sample data. Event sequences--or m-grams--include a key and a subsequent event. For each m-gram is stored a discounted probability generated by applying modified Turing'"'"'s estimate, for example, to a count-based probability. For a key occurring in the sample data there is stored a normalization constant which preferably (a) adjusts the discounted probabilities for multiple counting, if any, and (b) includes a freed probability mass allocated to m-grams which do not occur in the sample data. To determine the likelihood of a selected event following a string of known events, a "backing off" scheme is employed in which successively shorter keys (of known events) followed by the selected event (representing m-grams) are searched until an m-gram is found having a discounted probability stored therefor. The normalization constants of the longer searched keys--for which the corresponding m-grams have no stored discounted probability--are combined together with the found discounted probability to produce the likelihood of the selected event being next.
-
Citations
14 Claims
-
1. In a speech recognition system, a computer-implemented method of evaluating the likelihood of a word from a vocabulary of words occurring next after a string of known words, based on counts of word sequences occurring in a sample text which is sparse relative to possible word sequences, the method comprising the steps of:
-
(a) characterizing word sequences as m-grams, each m-gram occurring in the sample text representing a key of words followed by a word; (b) storring a discounted probability P for each of at least some m-grams occurring in the sample text; (c) generating a freed probability mass value β
L for each key occurring in the sample text, the β
L for a key of length L being allocated to those m-grams which (i) include the subject key and (ii) have no respective discounted probabilites stored therefor;(d) generating γ
L factors, each γ
L factor being valued to normalize the probability distribution of only those m-grams which (i) are formed from a key of length L and (ii) are not included in a greater-included m-gram having a key of known words;(e) storing for each key of length L, a value α
L =β
Lγ
L and(f) evaluating a likelihood of a selected word following a string of known words including the steps of; (i) searching successively shorter keys of the known words until a key is found which, when followed by the at least one selected word, represents an m-gram having a discounted probability P;
stored therefor, and retrieving P;(ii) retrieving the stored α
L value for each longer key searched before the stored m-gram is found; and(iii) computing a likelihood value of the selected word following the string of known words based on the retrieved α
L values and the retrieved P value. - View Dependent Claims (2)
-
-
3. In a speech recognition system, apparatus for estimating probability distributions of word sequences of length j from sparse sample data in which only some of the possible word sequences of length j occur in the sample text, the apparatus comprising:
-
(a) means for storing discounted probabilities, where each discounted probability replaces a count-based probability for an m-gram of less than or equal to j words in a sequence; (b) means for storing a normalization constant α
for each of at least some keys occurring in the sample text, wherein a key represents a sequence of less than j words and wherein the normalization constant for a key is based on (i) a freed probability mass which represents the difference from one of the sum of discounted probabilities of m-grams formed of the key and a subsequent word and (ii) a factor which avoids multiple counting among m-grams; and(c) processing means, having a sequence of known words and a selected word as input, for evaluating the likelihood of the word sequence including the selected word following (j-1) known words, said processing means including; means for retrieving a discounted probability for the longest m-gram that has a discounted probability stored therefor and that includes a key of (j-k) known words (where k is a positive integer less than j) followed by the selected word;
means for retrieving the normalization constant for each key of length greater than (j-k) which includes the known words; andmeans for multiplying the retrieved normalization constants and the retrieved discounted probability; the product of the multiplying means indicating the likelihood of the word sequence indicating the selected word followed by the (j-1) known words. - View Dependent Claims (4)
-
-
5. In a symbol recognition system, a computer-implemented method of evaluating the likelihood of a symbol from a vacabulary of symbols occurring next after a string of known symbols, based on counts of symbol sequences occurring in a sample text which is sparse relative to possible symbol sequences, the method comprising the steps of:
-
(a) characterizing symbol sequences as m-grams, each m-gram occurring in the sample text representing a key of symbols followed by a symbol; (b) storing a discounted probability P for each of at least some m-grams occurring in the sample text; (c) generating a freed probability mass value β
L for each key occurring in the sample text, the β
L for a key of length L being allocated to those m-grams which (i) include the key and (ii) have no respective discounted probabilities stored therefor;(d) generating γ
L factors, each γ
L factor being valued to normalize the probability distribution of only those m-grams which (i) are formed from a key of length L and (ii) are not included in a greater-included m-gram having a key of known symbols;(e) storing for each key of length L, a value α
L =β
Lγ
L and(f) evaluating a likelihood of a selected symbol following a string of known symbols including the steps of; (i) searching successively shorter kyes of the known symbols until a key is found which, when followed by the at least one selected symbol, represents an m-gram having a discounted probability P stored therefor, and retrieving P; (ii) retrieving the stored α
L value for each longer key searched before the stored m-gram is found; and(iii) comprising a likelihood value of the selected symbol following the string of known symbols based on the retrieved α
L values and the retrieved P value. - View Dependent Claims (6, 7, 8, 9)
-
-
10. In a speech recognition system, apparatus for estimating probability distributions of symbol sequences of length j from sparse sample data in which only some of the possible symbol sequences of length j occur in the sample text, the apparatus comprising:
-
(a) means for storing discounted probabilities, wherein each discounted probability replaces a count-based probability for an m-gram of less than or equal to j symbols in a sequence; (b) means for storing a normalization constant α
for each of at least some keys occurring in the sample text, wherein a key represents a sequence of less than j symbols and wherein the normalization constant of a key is based on (i) a freed probability mass which represents the difference from one of the sum of discounted probabilities of m-grams formed of the key and a subsequent symbol and (ii) a factor which avoids multiple counting among m-grams; and(c) processing means, having a sequence of known symbols and a selected symbol as input, for evaluating the likelihood of the symbol sequence including the selected symbol following (j-1) known symbols, said processing means including; means for retrieving a discounted probability for the longest m-gram that has a disconnected probability stored theefor and that includes a key of (j-k) known symbols (where k is a positive integer less than j) followed by the selected symbol; means for retrieving the normalization constant for each key of length greater than (j-k) which includes the known symbols, and means for multiplying the retrieved normalization constants and the retrieved discounted probability; the product of the multiplying means indicating the likelihood of the symbol sequence including the selected symbol followed by the (j-1) known symbols. - View Dependent Claims (11, 12, 13, 14)
-
Specification