Hierarchical labeler in a speech recognition system
First Claim
1. A method for assigning a label to a segment of speech to be recognized, comprising the steps of:
- providing a hierarchical fast ranking tree comprising a plurality of levels of subsets of prototypes, each prototype in a higher level subset being associated with one or more prototypes in a lower level subset;
inputting a feature vector signal representing the segment of speech to be recognized;
comparing the features of the vector signal with the features of the prototypes in a first level to find a first ranked list of the closest prototypes to the feature vector signal at that level;
comparing the features of the feature vector signal to the prototypes in a second level subset associated with the highest ranking prototypes in the first ranked list of prototypes, to find a second ranked list of the closest prototypes to the feature vector signal in the second level;
assigning the label associated with the highest ranking prototype in the lowest level subset to the feature vector signal; and
predictive labeling wherein the highest ranking prototype in the lowest level subset is assigned to a second feature vector signal which represents another segment of speech to be recognized if a distance between the second feature vector signal and the first feature vector signal is at least less than a predetermined threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech coding apparatus and method uses a hierarchy of prototype sets to code an utterance while consuming fewer computing resources. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of level subsets of prototype vector signals is computed, wherein each prototype vector signal in a higher level subset is associated with at least one prototype vector signal in a lower level subset. Each level subset contains a plurality of prototype vector signals, with lower level subsets containing more prototypes than higher level subsets. The closeness of the feature value of the first feature vector signal is compared to the parameter values of prototype vector signals in the first level subset of prototype vector signals to obtain a ranked list of prototype match scores for the first feature vector signal and each prototype vector signal in the first level subset. The closeness of the feature value of the first feature vector signal is compared to the parameter values of each prototype vector signal in a second (lower) level subset that is associated with the highest ranking prototype vectors in the first level subset, to obtain a second ranked list of prototype match scores. The identification value of the prototype vector signal in the second ranked list having the best prototype match score is output as a coded utterance representation signal of the first feature vector signal.
-
Citations
23 Claims
-
1. A method for assigning a label to a segment of speech to be recognized, comprising the steps of:
-
providing a hierarchical fast ranking tree comprising a plurality of levels of subsets of prototypes, each prototype in a higher level subset being associated with one or more prototypes in a lower level subset; inputting a feature vector signal representing the segment of speech to be recognized; comparing the features of the vector signal with the features of the prototypes in a first level to find a first ranked list of the closest prototypes to the feature vector signal at that level; comparing the features of the feature vector signal to the prototypes in a second level subset associated with the highest ranking prototypes in the first ranked list of prototypes, to find a second ranked list of the closest prototypes to the feature vector signal in the second level; assigning the label associated with the highest ranking prototype in the lowest level subset to the feature vector signal; and predictive labeling wherein the highest ranking prototype in the lowest level subset is assigned to a second feature vector signal which represents another segment of speech to be recognized if a distance between the second feature vector signal and the first feature vector signal is at least less than a predetermined threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A speech coding apparatus comprising:
-
means for measuring the value of at least one feature of an utterance during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values; first level subset means for storing a first plurality of prototype vector signals, each prototype vector signal having at least one parameter value and a unique identification value; second level subset means for storing a second plurality of prototype vector signals, each prototype vector signal having at least one parameter value and a unique identification value, and each second level subset prototype vector being associated with one of the prototype vector signals in the first level subset means; means for comparing the closeness of the feature value of the first feature vector signal to the parameter values of the prototype vector signals in the first level subset means to obtain prototype match scores for the first feature vector signal and each prototype vector signal in the first level subset means; means for comparing the closeness of the feature value of the first feature vector signal to the parameter values of the prototype vector signals in the second level subset means associated with the prototypes in the first level subset means that most closely match the feature value of the first vector signal; means for outputting at least the identification value of at least the prototype vector signal in the second level subset means having the best prototype match score as a coded utterance representation signal of the first feature vector signal; and means for predictive labeling wherein the highest ranking prototype in the lowest level subset is assigned to a second feature vector signal which represents another segment of speech to be recognized if a distance between the second feature vector signal and the first feature vector signal is at least less than a predetermined threshold. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A speech coding method comprising:
-
measuring the value of at least one feature of an utterance during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values; storing a first plurality of prototype vector signals as a first level subset of prototype vectors, each prototype vector signal having at least one parameter vector and a unique identification value; storing a second plurality of prototype vector signals, greater than the first plurality, as a second level subset of prototype vectors; comparing the closeness of the feature vector of the first feature vector signal to the parameter vectors of the prototype vector signals in the first level subset to obtain a ranked list of prototypes most closely matching the first feature vector signal; comparing the closeness of the feature vector of the parameter vectors of the prototype vector signals in the second level subset that are associated with the prototype vectors in the first level subset that most closely match the first feature vector signal to obtain a ranked list of prototypes in the second level subset most closely matching the first feature vector signal; outputting at least the identification value of at least the prototype vector signal in the second level subset, that is associated with a prototype vector in the first level subset, having the best prototype match score as a coded utterance representation signal of the first feature vector signal; and predictive labeling wherein the highest ranking prototype in the lowest level subset is assigned to a second feature vector signal which represents another segment of speech to be recognized if a distance between the second feature vector signal and the first feature vector signal is at least less than a predetermined threshold. - View Dependent Claims (21, 22, 23)
-
Specification