Hierarchical labeler in a speech recognition system

US 6,023,673 A
Filed: 06/04/1997
Issued: 02/08/2000
Est. Priority Date: 06/04/1997
Status: Expired due to Fees

First Claim

Patent Images

1. A method for assigning a label to a segment of speech to be recognized, comprising the steps of:

providing a hierarchical fast ranking tree comprising a plurality of levels of subsets of prototypes, each prototype in a higher level subset being associated with one or more prototypes in a lower level subset;

inputting a feature vector signal representing the segment of speech to be recognized;

comparing the features of the vector signal with the features of the prototypes in a first level to find a first ranked list of the closest prototypes to the feature vector signal at that level;

comparing the features of the feature vector signal to the prototypes in a second level subset associated with the highest ranking prototypes in the first ranked list of prototypes, to find a second ranked list of the closest prototypes to the feature vector signal in the second level;

assigning the label associated with the highest ranking prototype in the lowest level subset to the feature vector signal; and

predictive labeling wherein the highest ranking prototype in the lowest level subset is assigned to a second feature vector signal which represents another segment of speech to be recognized if a distance between the second feature vector signal and the first feature vector signal is at least less than a predetermined threshold.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech coding apparatus and method uses a hierarchy of prototype sets to code an utterance while consuming fewer computing resources. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of level subsets of prototype vector signals is computed, wherein each prototype vector signal in a higher level subset is associated with at least one prototype vector signal in a lower level subset. Each level subset contains a plurality of prototype vector signals, with lower level subsets containing more prototypes than higher level subsets. The closeness of the feature value of the first feature vector signal is compared to the parameter values of prototype vector signals in the first level subset of prototype vector signals to obtain a ranked list of prototype match scores for the first feature vector signal and each prototype vector signal in the first level subset. The closeness of the feature value of the first feature vector signal is compared to the parameter values of each prototype vector signal in a second (lower) level subset that is associated with the highest ranking prototype vectors in the first level subset, to obtain a second ranked list of prototype match scores. The identification value of the prototype vector signal in the second ranked list having the best prototype match score is output as a coded utterance representation signal of the first feature vector signal.

Citations

23 Claims

1. A method for assigning a label to a segment of speech to be recognized, comprising the steps of:
- providing a hierarchical fast ranking tree comprising a plurality of levels of subsets of prototypes, each prototype in a higher level subset being associated with one or more prototypes in a lower level subset;
  
  inputting a feature vector signal representing the segment of speech to be recognized;
  
  comparing the features of the vector signal with the features of the prototypes in a first level to find a first ranked list of the closest prototypes to the feature vector signal at that level;
  
  comparing the features of the feature vector signal to the prototypes in a second level subset associated with the highest ranking prototypes in the first ranked list of prototypes, to find a second ranked list of the closest prototypes to the feature vector signal in the second level;
  
  assigning the label associated with the highest ranking prototype in the lowest level subset to the feature vector signal; and
  
  predictive labeling wherein the highest ranking prototype in the lowest level subset is assigned to a second feature vector signal which represents another segment of speech to be recognized if a distance between the second feature vector signal and the first feature vector signal is at least less than a predetermined threshold.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method as claimed in claim 1, further comprising comparing the features of the feature vector signal to the prototypes in a third level subset associated with the highest ranking prototypes in the second ranked list of prototypes, to find a third ranked list of the closest prototypes to the feature vector signal in the third level.
  - 3. The method as claimed in claim 2, wherein prototypes are arranged in the plurality of levels by:
    - organizing all prototypes in an existing set of a large number of prototypes into a binary search tree terminating in a plurality of leaves by splitting all prototypes into at least two sets of prototypes based upon a Kullback-Leibler distance measure;
      
      continuing to split the set of prototypes until every leaf of the tree represents one prototype; and
      
      assigning the prototypes in the lowest level of the tree to the third level subset, the prototypes in a higher level of the tree to the second level subset, and the prototypes in a next higher level of the tree to the first level subset.
  - 4. The method as claimed in claim 1, wherein the second feature vector signal is approximately adjacent to the first feature vector signal.
  - 5. The method as claimed in claim 1, wherein a leaf refers to a group of closest prototypes and a rank refers to a list of ordered closest leaves.
  - 6. The method as claimed in claim 5, wherein each level of the hierarchical fast ranking tree has at least one rank associated therewith.
  - 7. The method as claimed in claim 5, wherein ranks calculated for higher levels of the hierarchical fast ranking tree are used to estimate ranks for the lower levels of the hierarchical fast ranking tree.
  - 8. The method as claimed in claim 1, wherein each prototype includes a plurality of elements, each element being represented by a Gaussian density distribution including a mean value and a variance value.
  - 9. The method as claimed in claim 8, further comprising the step of respectively splitting the prototypes into bands wherein each band contains the mean and the variance values of at least two elements of the particular prototype having substantially similar Gaussian density distributions with respect to each other.
  - 10. The method as claimed in claim 9, wherein each prototype has approximately 39 elements whereby the elements may be grouped into approximately 20 bands.
  - 11. The method as claimed in claim 10, wherein each band may be represented by an index.

12. A speech coding apparatus comprising:
- means for measuring the value of at least one feature of an utterance during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values;
  
  first level subset means for storing a first plurality of prototype vector signals, each prototype vector signal having at least one parameter value and a unique identification value;
  
  second level subset means for storing a second plurality of prototype vector signals, each prototype vector signal having at least one parameter value and a unique identification value, and each second level subset prototype vector being associated with one of the prototype vector signals in the first level subset means;
  
  means for comparing the closeness of the feature value of the first feature vector signal to the parameter values of the prototype vector signals in the first level subset means to obtain prototype match scores for the first feature vector signal and each prototype vector signal in the first level subset means;
  
  means for comparing the closeness of the feature value of the first feature vector signal to the parameter values of the prototype vector signals in the second level subset means associated with the prototypes in the first level subset means that most closely match the feature value of the first vector signal;
  
  means for outputting at least the identification value of at least the prototype vector signal in the second level subset means having the best prototype match score as a coded utterance representation signal of the first feature vector signal; and
  
  means for predictive labeling wherein the highest ranking prototype in the lowest level subset is assigned to a second feature vector signal which represents another segment of speech to be recognized if a distance between the second feature vector signal and the first feature vector signal is at least less than a predetermined threshold.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
- - 13. The speech coding apparatus as claimed in claim 12, wherein:
    - the measuring means measures the values of at least two features of an utterance during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values; and
      
      a scalar function of a feature vector signal comprises the value of only a single feature of the feature vector signal.
  - 14. The speech coding apparatus as claimed in claim 13, characterized in that the measuring means comprises a microphone.
  - 15. The speech coding apparatus as claimed in claim 14, wherein the measuring means comprises a spectrum analyzer for measuring the amplitudes of the utterance in two or more frequency bands during each of a series of successive time intervals.
  - 16. The speech coding apparatus as claimed in claim 12, wherein each prototype includes a plurality of elements, each element being represented by a Gaussian density distribution including a mean value and a variance value.
  - 17. The speech coding apparatus as claimed in claim 16, further comprising the step of respectively splitting the prototypes into bands wherein each band contains the mean and the variance values of one or more elements of the particular prototype having substantially similar Gaussian density distributions with respect to each other.
  - 18. The speech coding apparatus as claimed in claim 17, wherein each prototype has approximately 39 elements whereby the elements may be grouped into approximately 20 bands.
  - 19. The speech coding apparatus as claimed in claim 18, wherein each band may be represented by an index.

20. A speech coding method comprising:
- measuring the value of at least one feature of an utterance during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values;
  
  storing a first plurality of prototype vector signals as a first level subset of prototype vectors, each prototype vector signal having at least one parameter vector and a unique identification value;
  
  storing a second plurality of prototype vector signals, greater than the first plurality, as a second level subset of prototype vectors;
  
  comparing the closeness of the feature vector of the first feature vector signal to the parameter vectors of the prototype vector signals in the first level subset to obtain a ranked list of prototypes most closely matching the first feature vector signal;
  
  comparing the closeness of the feature vector of the parameter vectors of the prototype vector signals in the second level subset that are associated with the prototype vectors in the first level subset that most closely match the first feature vector signal to obtain a ranked list of prototypes in the second level subset most closely matching the first feature vector signal;
  
  outputting at least the identification value of at least the prototype vector signal in the second level subset, that is associated with a prototype vector in the first level subset, having the best prototype match score as a coded utterance representation signal of the first feature vector signal; and
  
  predictive labeling wherein the highest ranking prototype in the lowest level subset is assigned to a second feature vector signal which represents another segment of speech to be recognized if a distance between the second feature vector signal and the first feature vector signal is at least less than a predetermined threshold.
- View Dependent Claims (21, 22, 23)
- - 21. A speech coding method as claimed in claim 20, wherein the second level subset includes a number of prototypes greater than the number in the first level subset.
  - 22. A speech coding method as claimed in claim 21, wherein:
    - the step of measuring comprises measuring the values of at least two features of an utterance during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values; and
      
      a scalar function of a feature vector signal comprises the value of only a single feature of the feature vector signal.
  - 23. A speech coding method as claimed in claim 22, wherein the step of measuring comprises measuring the amplitudes of the utterance in two or more frequency bands during each of a series of successive time intervals.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Nahamoo, David, Picheny, Michael Alan, Bakis, Raimo, Sedivy, Jan
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Sax, Robert Louis

Application Number

US08/869,061
Time in Patent Office

979 Days
Field of Search

704/231, 704/243, 704/254, 704/236
US Class Current

704/231
CPC Class Codes

G10L 15/083 Recognition networks G10L15...

Hierarchical labeler in a speech recognition system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Hierarchical labeler in a speech recognition system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links