Method and apparatus for finding the best splits in a decision tree for a language model for a speech recognizer
First Claim
1. A method of automatic speech recognition comprising the steps of:
- converting an utterance into an utterance signal representing the utterance, said utterance comprising a series of at least a predictor word and a predicted word, said utterance signal comprising at least one predictor word signal representing the predictor word;
providing a set of M predictor feature signals, each predictor feature signal having a predictor feature value Xm, where M is an integer greater than or equal to three and m is an integer greater than zero and lens than or equal to M, each predictor feature signal in the set representing a different word;
generating a decision set which contains a subset of the M predictor feature signals representing the words;
comparing the predictor word signal with the predictor feature signals in the decision set;
outputting a first category feature signal representing a first predicted word if the predictor word signal is a member of the decision set, said first category feature signal being one of N category feature signals, each category feature signal representing a different word and having a category feature value Yn, where N is an integer greater than or equal to three, and n is an integer greater than zero and less than or equal to N; and
outputting a second category feature signal, different from the first category feature signal and representing a second predicted word different from the first predicted word if the predictor word signal is not a member of the decision set;
characterized in that the contents of the decision set are generated by the steps of;
providing a training text comprising a set of observed events, each event having a predictor feature X representing a predictor word and a category feature Y representing a predicted word, said predictor feature having one of M different possible values Xm, each Xm representing a different predictor word, said category feature having one of N possible values Yn, each Yn representing a different predicted word;
(a) measuring the predictor feature value Xm and the category feature value Yn of each event in the set of events;
(b) estimating, from the measured predictor feature values and the measured category feature values, the probability P(Xm, Yn) of occurrence of an event having a category feature value Yn and a predictor feature value Xm, for each Yn and each Xm ;
(c) selecting a starting set SXopt (t) of predictor feature values Xm, where t has an initial value;
(d) calculating, from the estimated probabilities P(Xm, Yn), the conditional probability P(SXopt (t)|Yn) that the predictor feature has a value in the set SXopt (t) when the category feature has a value Yn, for each Yn ;
(e) defining a number of pairs of sets SYj (t) and SYj (t) of category feature values Yn, where j is an integer greater than zero and less than or equal to (N-1), each set SYj (t) containing only those category feature values Yn having the j lowest values of P(SXopt (t)|Yn), each set SYj (t) containing only those category feature values Yn having the (N-j) highest values of P(SXopt (t)|Yn);
(f) finding a pair of sets SYopt (t) and SYopt (t) from among the pairs of sets SYj (t) and SYj (t) such that the pair of sets SYopt (t) and SYopt (t) have the lowest uncertainty in the value of the predictor feature;
(g) calculating, from the estimated probabilities P(Xm, Yn), the conditional probability P(SYopt (t)|Xm) that the category feature has a value in the set SYopt (t) when the predictor feature has a value Xm, for each Xm ;
(h) defining a number of pairs of sets SXi (t+1) and SXi (t+1) of predictor feature values Xm, where i is an integer greater than zero and less than or equal to (M-1), each set SXi (t+1) containing only those predictor feature values Xm having the i lowest values of P(SYopt (t)|Xm), each set SXi (t+1) containing only those predictor feature values Xm having the (M-i) highest values of P(SYopt (t)|Xm);
(i) finding a pair of sets SXopt (t+1) and SXopt (t+1) from among the pairs of sets SXi (t+1) and SXi (t+1) such that the pair of sets SXopt (t+1) and SXopt (t+1) have the lowest uncertainty in the value of the category feature; and
(1) setting the decision set equal to the set SXopt (t+1).
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for finding the best or near best binary classification of a set of observed events, according to a predictor feature X so as to minimize the uncertainty in the value of a category feature Y. Each feature has three or more possible values. First, the predictor feature value and the category feature value of each event is measured. The events are then split, arbitrarily, into two sets of predictor feature values. From the two sets of predictor feature values, an optimum pair of sets of category feature values is found having the lowest uncertainty in the value of the predictor feature. From the two optimum sets of category feature values, an optimum pair of sets is found having the lowest uncertainty in the value of the category feature. An event is then classified according to whether its predictor feature value is a member of a set of optimal predictor feature values.
39 Citations
2 Claims
-
1. A method of automatic speech recognition comprising the steps of:
-
converting an utterance into an utterance signal representing the utterance, said utterance comprising a series of at least a predictor word and a predicted word, said utterance signal comprising at least one predictor word signal representing the predictor word; providing a set of M predictor feature signals, each predictor feature signal having a predictor feature value Xm, where M is an integer greater than or equal to three and m is an integer greater than zero and lens than or equal to M, each predictor feature signal in the set representing a different word; generating a decision set which contains a subset of the M predictor feature signals representing the words; comparing the predictor word signal with the predictor feature signals in the decision set; outputting a first category feature signal representing a first predicted word if the predictor word signal is a member of the decision set, said first category feature signal being one of N category feature signals, each category feature signal representing a different word and having a category feature value Yn, where N is an integer greater than or equal to three, and n is an integer greater than zero and less than or equal to N; and outputting a second category feature signal, different from the first category feature signal and representing a second predicted word different from the first predicted word if the predictor word signal is not a member of the decision set; characterized in that the contents of the decision set are generated by the steps of; providing a training text comprising a set of observed events, each event having a predictor feature X representing a predictor word and a category feature Y representing a predicted word, said predictor feature having one of M different possible values Xm, each Xm representing a different predictor word, said category feature having one of N possible values Yn, each Yn representing a different predicted word; (a) measuring the predictor feature value Xm and the category feature value Yn of each event in the set of events; (b) estimating, from the measured predictor feature values and the measured category feature values, the probability P(Xm, Yn) of occurrence of an event having a category feature value Yn and a predictor feature value Xm, for each Yn and each Xm ; (c) selecting a starting set SXopt (t) of predictor feature values Xm, where t has an initial value; (d) calculating, from the estimated probabilities P(Xm, Yn), the conditional probability P(SXopt (t)|Yn) that the predictor feature has a value in the set SXopt (t) when the category feature has a value Yn, for each Yn ; (e) defining a number of pairs of sets SYj (t) and SYj (t) of category feature values Yn, where j is an integer greater than zero and less than or equal to (N-1), each set SYj (t) containing only those category feature values Yn having the j lowest values of P(SXopt (t)|Yn), each set SYj (t) containing only those category feature values Yn having the (N-j) highest values of P(SXopt (t)|Yn); (f) finding a pair of sets SYopt (t) and SYopt (t) from among the pairs of sets SYj (t) and SYj (t) such that the pair of sets SYopt (t) and SYopt (t) have the lowest uncertainty in the value of the predictor feature; (g) calculating, from the estimated probabilities P(Xm, Yn), the conditional probability P(SYopt (t)|Xm) that the category feature has a value in the set SYopt (t) when the predictor feature has a value Xm, for each Xm ; (h) defining a number of pairs of sets SXi (t+1) and SXi (t+1) of predictor feature values Xm, where i is an integer greater than zero and less than or equal to (M-1), each set SXi (t+1) containing only those predictor feature values Xm having the i lowest values of P(SYopt (t)|Xm), each set SXi (t+1) containing only those predictor feature values Xm having the (M-i) highest values of P(SYopt (t)|Xm); (i) finding a pair of sets SXopt (t+1) and SXopt (t+1) from among the pairs of sets SXi (t+1) and SXi (t+1) such that the pair of sets SXopt (t+1) and SXopt (t+1) have the lowest uncertainty in the value of the category feature; and (1) setting the decision set equal to the set SXopt (t+1).
-
-
2. An automatic speech recognition system comprising:
-
means for converting an utterance into an utterance signal representing the utterance, said utterance comprising a series of at least a predictor word and a predicted word, said utterance signal comprising at least one predictor word signal representing the predictor word; means for storing a set of M predictor feature signals, each predictor feature signal having a predictor feature value Xm, where M is an integer greater than or equal to three and m is an integer greater than zero and less than or equal to M, each predictor feature signal in the set representing a different word; means for generating a decision set which contains a subset of the M predictor feature signals representing the words; means for comparing the predictor word signal with the predictor feature signals in the decision set; means for outputting a first category feature signal representing a first predicted word if the predictor word signal is a member of the decision set, said first category feature signal being one of N category feature signals, each category feature signal representing a different word and having a category feature value Yn, where N is an integer greater than or equal to three, and n is an integer greater than zero and less than or equal to N; and means for outputting a second category feature signal, different from the first category feature signal and representing a second predicted word different from the first predicted word if the predictor word signal is not a member of the decision set; characterized in that the means for generating the decision set comprises; means for storing a training text comprising a set of observed events, each event having a predictor feature X representing a predictor word and a category feature Y representing a predicted word, said predictor feature having one of M different possible values Xm, each Xm representing a different predictor word, said category feature having one of N possible values Yn, each Yn representing a different predicted word; (a) means for measuring the predictor feature value Xm and the category feature value Yn of each event in the set of events; (b) means for estimating, from the measured predictor feature values and the measured category feature values, the probability P(Xm, Yn) of occurrence of an event having a category feature value Yn and a predictor feature value Xm, for each Yn and each Xm ; (c) means for selecting a starting set SXopt (t) of predictor feature values Xm, where t has an initial value; (d) means for calculating, from the estimated probabilities P(Xm, Yn), the conditional probability P(SXopt (t)|Yn) that the predictor feature has a value in the set SXopt (t) when the category feature has a value Yn, for each Yn ; (e) means for defining a number of pairs of sets SYj (t) and SYj (t) of category feature values Yn, where j is an integer greater than zero and less than or equal to (N-1), each set SYj (t) containing only those category feature values Yn having the j lowest values of P(SXopt (t)|Yn), each set SYj (t) containing only those category feature values Yn having the (N-j) highest values of P(SXopt (t)|Yn); (f) means for finding a pair of sets SYopt (t) and SYopt (t) from among the pairs of sets SYj (t) and SYj (t) such that the pair of sets SYopt (t) and SYopt (t) have the lowest uncertainty in the value of the predictor feature; (g) means for calculating, from the estimated probabilities P(Xm, Yn), the conditional probability P(SYopt (t)|Xm) that the category feature has a value in the set SYopt (t) when the predictor feature has a value Xm, for each Xm ; (h) means for defining a number of pairs of sets SXi (t+1) and SXi (t+1) of predictor feature values Xm, where i is an integer greater than zero and less than or equal to (M-1), each set SXi (t+1) containing only those predictor feature values Xm having the i lowest values of P(SYopt (t)|Xm), each set SXi (t+1) containing only those predictor feature values Xm having the (M-i) highest values of P(SYopt (t)|Xm); (i) means for finding a pair of sets SXopt (t+1) and SXopt (t+1) from among the pairs of sets SXi (t+1) and SXi (t+1) such that the pair of sets SXopt (t+1) and SXopt (t+1) have the lowest uncertainty in the value of the category feature; and (l) means for outputting the set SXopt (t+1) as the decision set.
-
Specification