Isolated word recognition using decision tree classifiers and time-indexed feature vectors
First Claim
1. A method of automatically recognizing isolated utterances, comprising the steps of:
- receiving an utterance to be recognized;
converting the received utterance into a sequence of digital signal samples;
forming from the sequence of digital signal samples a sequence of feature vectors each of which represents characteristics of the received utterance during a respective temporal portion of the utterance;
augmenting each of the feature vectors with a time index representative of a position of the respective feature vector in the sequence of feature vectors; and
classifying the augmented feature vectors by use of a pattern classifier algorithm.
9 Assignments
0 Petitions
Accused Products
Abstract
Machine recognition of isolated word utterances is carried out by applying time-indexed feature vectors to binary decision tree classifiers. A respective classifier is provided for each target word in the vocabulary of words to be recognized. Determinants for the nodes in the classifier tree structure are formed, during a training process, as hyper-planes which perform a "mean split" between the centroids of target word and non-target word classes of feature vectors assigned to the respective nodes. The process of training the machine recognition system is facilitated by storing node-assignment data in association with training data vectors. The assignment of training data vectors to sub-nodes proceeds on a level-by-level basis in the tree structure.
-
Citations
15 Claims
-
1. A method of automatically recognizing isolated utterances, comprising the steps of:
-
receiving an utterance to be recognized; converting the received utterance into a sequence of digital signal samples; forming from the sequence of digital signal samples a sequence of feature vectors each of which represents characteristics of the received utterance during a respective temporal portion of the utterance; augmenting each of the feature vectors with a time index representative of a position of the respective feature vector in the sequence of feature vectors; and classifying the augmented feature vectors by use of a pattern classifier algorithm. - View Dependent Claims (2, 3)
-
-
4. A method of automatically recognizing isolated utterances, comprising the steps of:
-
receiving an utterance to be recognized; converting the received utterance into a sequence of digital signal samples; forming from the sequence of digital signal samples a sequence of feature vectors each of which represents characteristics of the received utterance during a respective temporal portion of the utterance; and classifying the feature vectors by use of each of a plurality of decision tree classifier algorithms, each one of said plurality of decision tree classifier algorithms corresponding to a respective one of a plurality of target words. - View Dependent Claims (5, 6)
-
-
7. A method of training a processing device to perform a decision tree classifier algorithm, comprising the steps of:
-
supplying a plurality of data vectors to said processing device, each of said data vectors consisting of n elements, n being a positive integer; inputting to the processing device, with each of said data vectors, a respective binary label indicative of whether or not the data vector is representative of a target class of data vectors to be recognized by the classifier algorithm; calculating a first centroid vector as the average of the data vectors representative of the target class; calculating a second centroid vector as the average of the data vectors not representative of the target class; assigning to a right-side child node all of said data vectors that are closer in n-dimensional space to said first centroid vector than to said second centroid vector; assigning to a left-side child node all of said data vectors that are not assigned to said right-side child node; and applying the following steps recursively with respect to each of said nodes; determining whether said node satisfies a termination criterion; if said node is not determined to satisfy the termination criterion, calculating a first node centroid vector as the average of the data vectors assigned to said node and representative of the target class, calculating a second node centroid vector as the average of the data vectors assigned to said node and not representative of the target class, assigning to a right-side sub-node all of the data vectors assigned to said node that are closer in n-dimensional space to said first node centroid vector than to said second node centroid vector, and assigning to a left-side sub-node all of the data vectors not assigned to said right-side sub-node. - View Dependent Claims (8, 9)
-
-
10. A method of training a processing device to perform a decision tree classifier algorithm, said training being carried out using a set of training data vectors stored in a memory, said decision tree classifier algorithm being formed in terms of a tree structure comprising a plurality of non-terminal nodes and a plurality of terminal nodes, each of said non-terminal nodes having a plurality of child nodes associated therewith, the method comprising the steps of:
-
assigning a respective plurality of said training data vectors to each one of said non-terminal nodes; sub-assigning each of the respective plurality of training data vectors among the child nodes associated with the non-terminal node to which the respective plurality of vectors was assigned; and in association with each one of the respective plurality of training data vectors, storing in the memory sub-assignment data indicative of the child node to which said each training data vector was sub-assigned. - View Dependent Claims (11, 12)
-
-
13. Apparatus for automatically recognizing isolated utterances, comprising:
-
means for receiving an utterance to be recognized; means for converting the received utterance into a sequence of digital signal samples; and a processor programmed to; form from the sequence of digital signal samples a sequence of feature vectors each of which represents characteristics of the received utterance during a respective temporal portion of the utterance; augment each of the feature vectors with a time index representative of a position of the respective feature vector in the sequence of feature vectors; and classify the augmented feature vectors by use of a pattern classifier algorithm. - View Dependent Claims (14, 15)
-
Specification