Isolated word recognition using decision tree classifiers and time-indexed feature vectors

US 5,657,424 A
Filed: 10/31/1995
Issued: 08/12/1997
Est. Priority Date: 10/31/1995
Status: Expired due to Term

First Claim

Patent Images

1. A method of automatically recognizing isolated utterances, comprising the steps of:

receiving an utterance to be recognized;

converting the received utterance into a sequence of digital signal samples;

forming from the sequence of digital signal samples a sequence of feature vectors each of which represents characteristics of the received utterance during a respective temporal portion of the utterance;

augmenting each of the feature vectors with a time index representative of a position of the respective feature vector in the sequence of feature vectors; and

classifying the augmented feature vectors by use of a pattern classifier algorithm.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Machine recognition of isolated word utterances is carried out by applying time-indexed feature vectors to binary decision tree classifiers. A respective classifier is provided for each target word in the vocabulary of words to be recognized. Determinants for the nodes in the classifier tree structure are formed, during a training process, as hyper-planes which perform a "mean split" between the centroids of target word and non-target word classes of feature vectors assigned to the respective nodes. The process of training the machine recognition system is facilitated by storing node-assignment data in association with training data vectors. The assignment of training data vectors to sub-nodes proceeds on a level-by-level basis in the tree structure.

Citations

15 Claims

1. A method of automatically recognizing isolated utterances, comprising the steps of:
- receiving an utterance to be recognized;
  
  converting the received utterance into a sequence of digital signal samples;
  
  forming from the sequence of digital signal samples a sequence of feature vectors each of which represents characteristics of the received utterance during a respective temporal portion of the utterance;
  
  augmenting each of the feature vectors with a time index representative of a position of the respective feature vector in the sequence of feature vectors; and
  
  classifying the augmented feature vectors by use of a pattern classifier algorithm.
- View Dependent Claims (2, 3)
- - 2. A method according to claim 1, wherein said classifying step includes classifying the augmented feature vectors by use of each of a plurality of decision tree classifier algorithms, each one of said plurality of decision tree classifier algorithms corresponding to a respective one of a plurality of target words.
  - 3. A method according to claim 2, wherein each of said plurality of decision tree classifier algorithms assigns a respective numerical score to each one of said augmented feature vectors, and further comprising the steps of:
    - aggregating, with respect to each of said decision tree classifier algorithms, the numerical scores assigned to the augmented feature vectors by means of the respective decision tree classifier algorithm; and
      
      classifying the received utterance as the target word corresponding to the decision tree classifier algorithm which produced a largest one of the aggregated numerical scores.

4. A method of automatically recognizing isolated utterances, comprising the steps of:
- receiving an utterance to be recognized;
  
  converting the received utterance into a sequence of digital signal samples;
  
  forming from the sequence of digital signal samples a sequence of feature vectors each of which represents characteristics of the received utterance during a respective temporal portion of the utterance; and
  
  classifying the feature vectors by use of each of a plurality of decision tree classifier algorithms, each one of said plurality of decision tree classifier algorithms corresponding to a respective one of a plurality of target words.
- View Dependent Claims (5, 6)
- - 5. A method according to claim 4, wherein each of said plurality of decision tree classifier algorithms assigns a respective numerical score to each one of said feature vectors, and said classifying step further includes:
    - aggregating, with respect to each of said decision tree classifier algorithms, the numerical scores assigned to the feature vectors by means of the respective decision tree classifier algorithm; and
      
      classifying the received utterance as the target word corresponding to the decision tree classifier algorithm which produced a largest one of the aggregated numerical scores.
  - 6. A method according to claim 4, wherein each of said plurality of decision tree classifier algorithms assigns a respective numerical score to each one of said feature vectors, and said classifying step further includes:
    - aggregating, with respect to each of said decision tree classifier algorithms, the numerical scores assigned to the feature vectors by means of the respective decision tree classifier algorithm;
      
      comparing a largest one of the aggregated numerical scores to a predetermined threshold; and
      
      if said largest one of the aggregated numerical scores exceeds said threshold, classifying the received utterance as the target word corresponding to the decision tree classifier algorithm which produced said largest one of the aggregated numerical scores.

7. A method of training a processing device to perform a decision tree classifier algorithm, comprising the steps of:
- supplying a plurality of data vectors to said processing device, each of said data vectors consisting of n elements, n being a positive integer;
  
  inputting to the processing device, with each of said data vectors, a respective binary label indicative of whether or not the data vector is representative of a target class of data vectors to be recognized by the classifier algorithm;
  
  calculating a first centroid vector as the average of the data vectors representative of the target class;
  
  calculating a second centroid vector as the average of the data vectors not representative of the target class;
  
  assigning to a right-side child node all of said data vectors that are closer in n-dimensional space to said first centroid vector than to said second centroid vector;
  
  assigning to a left-side child node all of said data vectors that are not assigned to said right-side child node; and
  
  applying the following steps recursively with respect to each of said nodes;
  
  determining whether said node satisfies a termination criterion;
  
  if said node is not determined to satisfy the termination criterion, calculating a first node centroid vector as the average of the data vectors assigned to said node and representative of the target class, calculating a second node centroid vector as the average of the data vectors assigned to said node and not representative of the target class, assigning to a right-side sub-node all of the data vectors assigned to said node that are closer in n-dimensional space to said first node centroid vector than to said second node centroid vector, and assigning to a left-side sub-node all of the data vectors not assigned to said right-side sub-node.
- View Dependent Claims (8, 9)
- - 8. A method according to claim 7, wherein each node is determined to satisfy a termination criterion if all data vectors assigned to such node are representative of the target class, or if all data vectors assigned to such node are not representative of the target class.
  - 9. A method according to claim 8, wherein each node is determined to satisfy a termination criterion if such node is at a predetermined level of a tree structure defined for said classifier algorithm.

10. A method of training a processing device to perform a decision tree classifier algorithm, said training being carried out using a set of training data vectors stored in a memory, said decision tree classifier algorithm being formed in terms of a tree structure comprising a plurality of non-terminal nodes and a plurality of terminal nodes, each of said non-terminal nodes having a plurality of child nodes associated therewith, the method comprising the steps of:
- assigning a respective plurality of said training data vectors to each one of said non-terminal nodes;
  
  sub-assigning each of the respective plurality of training data vectors among the child nodes associated with the non-terminal node to which the respective plurality of vectors was assigned; and
  
  in association with each one of the respective plurality of training data vectors, storing in the memory sub-assignment data indicative of the child node to which said each training data vector was sub-assigned.
- View Dependent Claims (11, 12)
- - 11. A method according to claim 10, wherein said tree structure has a first level consisting of a root node which is not a child node of any other node, a second level consisting of the child nodes of said root node, and a third level consisting of the child nodes of the nodes which constitute said second level, and said assigning and sub-assigning steps are carried out so that, with respect to each one of said levels, each one of the stored data vectors is assigned among the nodes of the respective level before any one of the data vectors is sub-assigned to a child node of a node of the respective level.
  - 12. A method according to claim 10, wherein each non-terminal node has exactly two child nodes.

13. Apparatus for automatically recognizing isolated utterances, comprising:
- means for receiving an utterance to be recognized;
  
  means for converting the received utterance into a sequence of digital signal samples; and
  
  a processor programmed to;
  
  form from the sequence of digital signal samples a sequence of feature vectors each of which represents characteristics of the received utterance during a respective temporal portion of the utterance;
  
  augment each of the feature vectors with a time index representative of a position of the respective feature vector in the sequence of feature vectors; and
  
  classify the augmented feature vectors by use of a pattern classifier algorithm.
- View Dependent Claims (14, 15)
- - 14. Apparatus according to claim 13, wherein said processor classifies the augmented feature vectors by use of each of a plurality of decision tree classifier algorithms, each one of said plurality of decision tree classifier algorithms corresponding to a respective one of a plurality of target words.
  - 15. Apparatus according to claim 14, wherein said processor uses each of said plurality of decision tree classifier algorithms to assign a respective numerical score to each one of said augmented feature vectors, and said processor is further programmed to:
    - aggregate, with respect to each of said decision tree classifier algorithms, the numerical scores assigned to the augmented feature vectors by means of the respective decision tree classifier algorithm; and
      
      classify the received utterance as the target word corresponding to the decision tree classifier algorithm which produced a largest one of the aggregated numerical scores.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Dictaphone Corporation (Microsoft Corporation)
Inventors
Farrell, Kevin R., Sorensen, Jeffrey S.
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
COLLINS, ALPHONSO

Application Number

US08/550,920
Time in Patent Office

651 Days
Field of Search

395/2.41, 395/2.43, 395/2.45-2.47, 395/2.53, 395/2.54, 395/2.6, 395/2.65
US Class Current

704/255
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/05   Word boundary detection

G10L 15/10   using distance or distortio...

G10L 2015/0638   Interactive procedures

Isolated word recognition using decision tree classifiers and time-indexed feature vectors

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Isolated word recognition using decision tree classifiers and time-indexed feature vectors

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links