Speaker-independent label coding apparatus

US 5,182,773 A
Filed: 03/22/1991
Issued: 01/26/1993
Est. Priority Date: 03/22/1991
Status: Expired due to Term

First Claim

Patent Images

1. A speech coding apparatus comprising:

means for storing a plurality of classes each having an identifier represented by at least two of a plurality of prototypes, each of the plurality of prototypes having at least one prototype value;

transducer means for extracting from an utterance a feature vector signal having at least one feature value;

means for establishing a match between the feature vector signal and at least one of the classes by selecting from the plurality of prototypes at least one prototype having a prototype value that best matches the feature value of the feature vector signal; and

means for coding the feature vector signal with the identifier of the class represented by the selected at least one prototype vector.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention is related to speech recognition and particularly to a new type of vector quantizer and a new vector quantization technique in which the error rate of associating a sound with an incoming speech signal is drastically reduced. To achieve this end, the present invention technique groups the feature vectors in a space into different prototypes at least two of which represent a class of sound. Each of the prototypes may in turn have a number of subclasses or partitions. Each of the prototypes and their subclasses may be assigned respective identifying values. To identify an incoming speech feature vector, at least one of the feature values of the incoming feature vector is compared with the different values of the respective prototypes, or the subclasses of the prototypes. The class of sound whose group of prototypes, or at least one of the prototypes, whose combined value most closely matches the value of the feature value of the feature vector is deemed to be the class corresponding to the feature vector. The feature vector is then labeled with the identifier associated with that class.

51 Citations

View as Search Results

76 Claims

1. A speech coding apparatus comprising:
- means for storing a plurality of classes each having an identifier represented by at least two of a plurality of prototypes, each of the plurality of prototypes having at least one prototype value;
  
  transducer means for extracting from an utterance a feature vector signal having at least one feature value;
  
  means for establishing a match between the feature vector signal and at least one of the classes by selecting from the plurality of prototypes at least one prototype having a prototype value that best matches the feature value of the feature vector signal; and
  
  means for coding the feature vector signal with the identifier of the class represented by the selected at least one prototype vector.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. Speech coding apparatus of claim 1, wherein the prototype value of the at least one prototype is computed from at least means, variances and a priori probabilities of a set of acoustic feature vectors associated with the prototype.
  - 3. Speech coding apparatus of claim 1, wherein the prototype value of the at least one prototype is computed by associating location of the feature value of the one feature vector signal on a probability distribution function of the prototype.
  - 4. Speech coding apparatus of claim 1, wherein each class of the plurality of classes is represented by a plurality of prototypes whose respective prototype values are considered as a whole against the feature value of the feature vector signal to determine whether the feature vector signal corresponds to the class.
  - 5. Speech coding apparatus of claim 1, further comprising:
    - means for storing a plurality of training classes;
      
      means for measuring and transforming training utterances into a series of training feature vectors each having a feature value; and
      
      means for correlating each of the series of training feature vectors with one of the training classes to generate the plurality of stored classes.
  - 6. Speech coding apparatus of claim 5, further comprising:
    - means for measuring and extracting from utterances over successive predetermined time periods corresponding successive sets of feature vectors, each feature vector of the successive sets of feature vectors having a dimensionality of at least one feature value;
      
      means for merging the feature vectors in each of the successive sets of feature vectors to form a plurality of consolidated feature vectors whose respective dimensionalities being the sum of the dimensionalities of the corresponding merged feature vectors, the consolidated feature vectors being more adaptable for discrimination between the stored training classes; and
      
      means for spatially reorienting the consolidated feature vectors to reduce their dimensionality to thereby effect easier manipulation thereof.
  - 7. Speech coding apparatus of claim 6, wherein each of the training classes is divided into training subclasses, further comprising:
    - means for configuring the training subclasses as respective training distribution functions having corresponding means, variances and a priori probabilities; and
      
      means for storing the training distribution functions, each of the training distribution functions representing a training prototype.
  - 8. Speech coding apparatus of claim 7, wherein each of the stored classes has at least one subcomponent;
    - andwherein the correlating means correlates the series of feature vectors with the at least one subcomponent to generate a plurality of stored component classes.
  - 9. Speech coding apparatus of claim 8, wherein the configuring means further configures the plurality of component classes as respective distribution functions each having corresponding means, variances and a priori probabilities;
    - further comprising;
      
      means for storing the distribution functions representing the component classes, each of the distribution functions of the component classes representing a prototype.
  - 10. Speech coding apparatus of claim 1, wherein the coding means comprises:
    - a quantizing means for outputting a label corresponding to the coded feature vector signal.
  - 11. Speech coding apparatus of claim 1, wherein the establishing means comprises:
    - means for grouping a plurality of speech feature vectors into a predetermined number of prototypes each having respective means, variances and a priori probabilities; and
      
      means for dividing each of the predetermined number of prototypes into at least two sub-prototypes to better differentiate the feature vector signal from other feature vector signals.

12. A speech coding apparatus comprising:
- means for storing a plurality of prototypes representative of a plurality of classes, each class having an identifier represented by at least two of the plurality of prototypes, each of the plurality of prototypes having at least one prototype value;
  
  transducer means for extracting from an utterance a feature vector signal having at least one feature value;
  
  means for establishing a match between the feature vector signal and at least one class by comparing the feature value of the feature vector signal against the respective prototype values of the prototypes;
  
  means for coding the feature vector signal with the identifier of the class represented by any of the prototypes having a prototype value most closely matching the feature value of the feature vector signal.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 13. Speech coding apparatus of claim 12, wherein each class is represented by a number of prototypes of the plurality of prototypes, the respective prototype values of the prototypes of each class being considered as a whole against the feature value of the feature vector signal to determine which class of the plurality of classes the feature vector signal best corresponds to.
  - 14. Speech coding apparatus of claim 12, wherein the prototype value of each prototype is computed from at least means, variances and a priori probabilities of a set of acoustic feature vectors associated with the prototype.
  - 15. Speech coding apparatus of claim 12, wherein each prototype has a score value computed by associating location of the feature value of the one feature vector signal on a probability distribution function of the prototype.
  - 16. Speech coding apparatus of claim 12, further comprising:
    - means for storing a plurality of training classes;
      
      means for measuring and transforming training utterances into a series of training feature vectors each having a feature value; and
      
      means for correlating each of the series of training feature vectors with one of the training classes to generate the plurality of stored classes.
  - 17. Speech coding apparatus of claim 16, further comprising:
    - means for measuring and extracting from utterances over successive predetermined time periods corresponding successive sets of feature vectors, each feature vector of the successive sets of feature vectors each having a dimensionality and at least one feature value;
      
      means for merging the feature vectors in each of the successive sets of feature vectors to form a plurality of consolidated feature vectors whose respective dimensionalities being the sum of the dimensionalities of the corresponding merged feature vectors, the consolidated feature vectors being more adaptable for discrimination between the stored training classes; and
      
      means for spatially reorienting the consolidated feature vectors to reduce their dimensionality to thereby afford easier manipulation thereof.
  - 18. Speech coding apparatus of claim 17, wherein each of the training classes is divided into training subclasses, further comprising:
    - means for configuring the training subclasses as respective training distribution function having corresponding means, variances and a priori probabilities; and
      
      means for storing the training distribution functions, each of the training distribution functions representing a training prototype.
  - 19. Speech coding apparatus of claim 18, wherein each of the stored classes has at least one subcomponent;
    - andwherein the correlating means correlates the series of feature vectors with the at least one subcomponent to generate a plurality of stored component classes.
  - 20. Speech coding apparatus of claim 19, wherein the configuring means further configures the plurality of component classes as respective distribution functions each having corresponding means, variances and a priori probabilities;
    - further comprising;
      
      means for storing the distribution functions representing the component classes, each of the distribution functions of the component classes representing a prototype.
  - 21. Speech coding apparatus of claim 12, wherein the coding means comprises:
    - a quantizing means for outputting a label corresponding to the coded feature vector signal.
  - 22. Speech coding apparatus of claim 12, wherein the establishing means comprises:
    - means for grouping a plurality of speech feature vectors into a predetermined number of prototype each having respective means, variances and a priori probabilities; and
      
      means for dividing each of the predetermined number of prototype into at least two sub-prototypes to better differentiate the feature vector signal from other feature vector signals.

23. A method of coding speech comprising the steps of:
- (a) storing in a memory means a plurality of classes each having an identifier represented by at least two of a plurality of prototypes, each of the plurality of prototypes having at least one prototype value;
  
  (b) using transducer means to extract from an utterance a feature vector signal having at least one feature value;
  
  (c) establishing a correspondence between the feature vector signal and at least one class of the plurality of classes by selecting from among a plurality of prototypes at least one prototype whose prototype value most closely matches the feature value of the feature vector signal; and
  
  (d) coding the feature vector signal with the identifier of class represented by the selected at least one prototype.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 24. The method of claim 23, wherein prior to step (a), the method further comprising the steps of:
    - establishing an inventory of training classes;
      
      extracting training feature vectors from a string of training text; and
      
      correlating each of the feature vectors with one of the training classes.
  - 25. The method of claim 24, further comprising the steps of:
    - measuring and extracting from utterances over successive predetermined periods of time corresponding successive sets of feature vectors, each feature vector of the successive sets of feature vectors having a dimensionality of at least one feature value;
      
      merging the feature vectors in each of the successive sets of feature vectors to form a plurality of consolidated feature vectors whose respective dimensionalities being the sum of the dimensionalities of the corresponding merged feature vectors, the consolidated feature vectors being more adaptable for discrimination between the stored training classes; and
      
      spatially reorienting the consolidated feature vectors to reduce their dimensionalities to thereby effect easier manipulation thereof.
  - 26. The method of claim 25, further comprising the steps of:
    - establishing the number of prototypes required to provide adequate representation of a class; and
      
      wherein for each of the training classes, the method further comprising the steps of;
      
      selecting a number of training prototypes;
      
      calculating respective new training prototypes by averaging the respective values of feature vectors situated proximate to each of the training prototypes until the average distance between the feature vectors remains substantially constant; and
      
      successively replacing the two closest new training prototypes with another new training prototype whose value is the average of the values of the replaced training prototypes until a predetermined number of another training prototypes remains.
  - 27. The method of claim 26, further comprising the steps of:
    - using a distribution analysis on the predetermined number of training prototypes to calculate a corresponding set of new training prototypes each having an estimated means, variances and a priori probabilities; and
      
      dividing each new training prototype into corresponding additional training prototypes.
  - 28. The method of claim 24, wherein the correlating step comprises utilizing a viterbi alignment technique.
  - 29. The method of claim 23, wherein step (c) further comprises the steps of:
    - establishing the number of prototypes required to provide adequate representation for a class; and
      
      wherein for each of the classes, the method further comprising the steps of;
      
      selecting a number of prototypes;
      
      calculating respective new prototypes by averaging the respective values of feature vectors situated proximate to each of the prototypes until the average distance between the feature vectors remains substantially constant; and
      
      successively replacing the two closest new prototypes with another new prototype whose value is the average of the values of the replaced prototypes until a predetermined number of another prototypes remains.
  - 30. The method of claim 29, further comprising the steps of:
    - using a distribution analysis on the predetermined number of the another prototypes to calculate a corresponding set of prototypes each having estimated means, variances and a priori probabilities;
      
      dividing each prototype having the estimated means, variances and a priori probabilities into additional prototypes to provide a greater number of prototypes for comparison with the feature vector signal.
  - 31. The method of claim 23, wherein the prototype value of the at least one prototype is computed from means, variances and a priori probabilities of a set of acoustic feature vectors associated with the prototype.
  - 32. The method of claim 23, wherein the prototype value of the at least one prototype is computed by associating the location of the feature value of the one feature vector signal on a probability distribution function of the prototype.

33. A method of coding speech comprising the steps of:
- (a) storing in a memory means a plurality of prototype vectors representative of a plurality of classes, each class having an identifier represented by at least one of the plurality of prototype vectors, each of the plurality of prototype vectors having at least one prototype value;
  
  (b) using transducer means to extract from an utterance a feature vector signal having a feature value;
  
  (c) establishing a correspondence between the feature vector signal and at least one class by comparing the feature value of the feature vector signal against the respective prototype values of the prototype vectors;
  
  (d) coding the feature vector signal with the identifier of the class represented by any of the prototype vectors having a prototype value that most closely matches the feature value of the feature vector signal.
- View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43)
- - 34. The method of claim 33, wherein each class is represented by a number of prototype vectors of the plurality of prototype vectors, and wherein the method further comprising the step of:
    - considering the respective prototype values of the prototype vectors of each class as a whole against the feature value of the feature vector signal to determine which class of the plurality of classes the feature vector signal best corresponds to.
  - 35. The method of claim 33, wherein prior to step (a), the method further comprising the steps of:
    - establishing an inventory of training classes;
      
      extracting training feature vectors from a string of training text; and
      
      correlating each of the feature vectors with one of the training classes.
  - 36. The method of claim 35, further comprising the steps of:
    - measuring and extracting from utterances over successive predetermined periods of time corresponding successive sets of feature vectors, each feature vector of the successive sets of feature vectors having a dimensionality of at least one feature value;
      
      merging the feature vectors in each of the successive sets of feature vectors to form a plurality of consolidated feature vectors whose respective dimensionalities being the sum of the dimensionalities of the corresponding merged feature vectors, the consolidated feature vectors being more adaptable for discrimination between the stored training classes; and
      
      spatially reorienting the consolidated feature vectors to reduce their dimensionalities to thereby effect easier manipulation thereof.
  - 37. The method of claim 36, further comprising the steps of:
    - establishing the number of prototype vectors required to provide adequate representation of a class; and
      
      wherein for each of the training classes, the method further comprising the steps of;
      
      selecting a number of training prototype vectors;
      
      calculating respective new training prototype vectors by averaging the respective values of feature vectors situated proximate to each of the training prototype vectors until the average distance between the feature vectors remains substantially constant; and
      
      successively replacing the two closest new training prototype vectors with another new training prototype vector whose value is the average of the values of the replaced training prototype vectors until a predetermined number of another training prototype vectors remains.
  - 38. The method of claim 37, further comprising the steps of:
    - using a distribution analysis on the predetermined number of training prototype vectors to calculate a corresponding set of new training prototype vectors each having estimated means, variances and a priori probabilities; and
      
      dividing each new training prototype vector into corresponding additional training prototype vectors.
  - 39. The method of claim 33, wherein step (c) further comprises the steps of:
    - establishing the number of prototype vectors required to provide adequate representation for a class; and
      
      wherein for each of the classes, the method further comprising the steps of;
      
      selecting a number of prototype vectors;
      
      calculating respective new prototype vectors by averaging the respective values of feature vectors situated proximate to each of the prototype vectors until the average distance between the feature vectors remains substantially constant; and
      
      successively replacing the two closest new prototype vectors with another new prototype vector whose value is the average of the values of the replaced prototype vectors until a predetermined number of another prototype vectors remains.
  - 40. The method of claim 39, further comprising the steps of:
    - using a distribution analysis on the predetermined number of the another prototype vectors to calculate a corresponding set of prototype vectors each having estimated means, variances and a priori probabilities;
      
      dividing each prototype vector having the estimated means, variances and a priori probabilities into additional prototype vectors to provide a greater number of prototype vectors for comparison with the feature vector signal.
  - 41. The method of claim 33, wherein the correlating step comprises utilizing a Viterbi alignment technique.
  - 42. The method of claim 33, wherein the prototype value of the at least one prototype vector is computed from means, variances and a priori probabilities of a set of acoustic feature vectors associated with the prototype.
  - 43. The method of claim 33, wherein the prototype value of the at least one prototype vector is computed by associating location of the feature value of the one feature vector signal on a probability distribution function of the prototype vector.

44. A speech coding apparatus comprising:
- means for storing two or more prototype vector signals, each prototype vector signal representing a prototype vector having an identifier and at least two partitions, each partition having at least one partition value;
  
  transducer means for measuring value of at least one feature of an utterance during a time interval to produce a feature vector signal representing the value of the at least one feature of the utterance;
  
  means for calculating a match score for each partition, each partition match score representing the value of a match between the partition value of the partition and the feature value of the feature vector signal;
  
  means for calculating a prototype match score for each prototype vector, each prototype match score representing a function of the partition match scores for all partitions in the prototype vector; and
  
  means for coding the feature vector signal with the identifier of the prototype vector signal having a best prototype match score.
- View Dependent Claims (45, 46, 47, 48, 49, 50, 51, 52, 53, 54)
- - 45. An apparatus as claimed in claim 44, characterized in that:
    - each partition match score is proportional to the joint probability of occurrence of the feature value of the feature vector signal and the partition value of the partition; and
      
      the prototype match score represents the sum of the partition match scores for all partitions in the prototype vector.
  - 46. An apparatus as claimed in claim 45, further comprising means for generating prototype vector signals, said prototype vector signal generating means comprising:
    - means for measuring the value of at least one feature of a training utterance during each of a series of successive first time intervals to produce a series of training corresponding to a first time interval, each training feature vector signal representing the value of at least one feature of the training utterance during a second time interval containing the corresponding first time interval, each second time interval being greater than or equal to the corresponding first time interval;
      
      means for providing a network of elemental models corresponding to the training utterance;
      
      means for correlating the training feature vector signals in the series of training feature vector signals to the elemental models in the network of elemental models corresponding to the training utterance so that each training feature vector signal in the series of training feature vector signals corresponds to one elemental model in the network of elemental models corresponding to the training utterance;
      
      means for selecting a fundamental set of all training feature vector signals which correspond to all occurrences of a first elemental model in the network of elemental models corresponding to the training utterance;
      
      means for selecting at least first and second different subsets of the fundamental set of training feature vector signals to form a first label set of training feature vector signals;
      
      means for calculating centroid of the feature values of the training feature vector signals of each of the first and second subsets of the fundamental set; and
      
      means for storing a first prototype vector signal corresponding to the first label set of training feature vector signals, said first prototype vector signal representing a first prototype vector having at least first and second partitions, each partition having at least one partition value, the first partition having a partition value equal to the value of the centroid of the feature values of the training feature vector signals in the first subset of the fundamental set, the second partition having a partition value equal to the value of the centroid of the feature values of the training feature vector signals in the second subset of the fundamental set.
  - 47. An apparatus as claimed in claim 46, characterized in that the centroid is arithmetic average.
  - 48. An apparatus as claimed in claim 47, characterized in that the network of elemental models is a series of elemental models.
  - 49. An apparatus as claimed in claim 48, characterized in that:
    - the fundamental set of training feature vector signals is divided into at least first, second and third subsets of training feature vector signals;
      
      the calculating means further calculates the centroid of the feature values of the training feature vector signals in the third subset; and
      
      the apparatus further comprises means for storing a second prototype vector signal, said second prototype vector signal representing the value of the centroid of the feature values of the training feature vector signals in the third subset of the fundamental set.
  - 50. An apparatus as claimed in claim 49, characterized in that:
    - the feature values of the training feature vector signals in each subset of the fundamental set have a feature value variance and a a priori probability;
      
      the apparatus further comprises means for calculating the variance and a priori probability of the feature values of the training feature vector signals in each subset of the fundamental set;
      
      the first partition of the first prototype vector has a further partition value equal to the value of the variance and a priori probability of the feature values of the training feature vector signals in the first subset of the fundamental set;
      
      the second partition of the first prototype vector has a further partition value equal to the value of the variance and a priori probability of the feature values of the training feature vector signals in the second subset of the fundamental set; and
      
      the second prototype signal represents the value of the variance and a priori probability of the feature values of the training feature vector signals in the third subset of the fundamental set.
  - 51. An apparatus as claimed in claim 50, characterized in that:
    - the apparatus further comprises means for estimating conditional probability of occurrence of each subset of the fundamental set of training feature vector signals given the occurrence of the first label set;
      
      the apparatus further comprises means for estimating the probability of occurrence of the first label set of training feature vector signals;
      
      the first prototype vector further represents the estimated probability of occurrence of the first label set of training feature vector signals;
      
      the first partition of the first prototype vector has a further partition value equal to the estimated conditional probability of occurrence of the first subset of the fundamental set of training feature vector signals given the occurrence of the first label set; and
      
      the second partition of the first prototype vector has a further partition value equal to the estimated conditional probability of occurrence of the second subset of the fundamental set of training feature vector signals given the occurrence of the first label set.
  - 52. An apparatus as claimed in claim 51, characterized in that:
    - each second time interval is equal to at least two first time intervals; and
      
      each feature vector signal comprises at least two feature values of the utterance at two different times.
  - 53. An apparatus as claimed in claim 52, characterized in that each feature vector signal represents values of m features, where m is an integer greater than or equal to two;
    - each partition has n partition values, where n is less than m; and
      
      the apparatus further comprises means for transforming the m values of each feature vector signal to n values prior to calculating the centroids, and variances and a priori probability of the subsets.
  - 54. An apparatus as claimed in claim 53, characterized in that:
    - the elemental models are elemental probabilistic models;
      
      the correlating means comprises means for aligning the feature vector signals and the elemental probabilistic models.

55. A speech coding method comprising the steps of:
- storing two or more prototype vector signals, each prototype vector signal representing a prototype vector having an identifier and at least two partitions, each partition having at least one partition value;
  
  using transducer means to measure a value of at least one feature of an utterance during a time interval to produce a feature vector signal representing the value of the at least one feature of the utterance;
  
  calculating a match score for each partition, each partition match score representing the value of a match between the partition value of the partition and the feature value of the feature vector signal;
  
  calculating a prototype match score for each prototype vector, each prototype match score representing a function of the partition match scores for all partitions in the prototype vector; and
  
  coding the feature vector signal with the identifier of the prototype vector signal having the a prototype match score.
- View Dependent Claims (56, 57, 58, 59, 60, 61, 62, 63, 64, 65)
- - 56. A method as claimed in claim 55, characterized in that:
    - each partition match score is proportional to the joint probability of occurrence of the feature value of the feature vector signal and the partition value of the partition; and
      
      the prototype match score represents the sum of the partition match scores for all partitions in the prototype vector.
  - 57. A method as claimed in claim 56, further comprising a method of generating prototype vector signals, said prototype vector signal generating method comprising:
    - measuring the value of at least one feature of a training utterance during each of a series of successive first time intervals to produce a series of training feature vector signals, each training feature vector signal corresponding to a first time interval, each training feature vector signal representing the value of at least one feature of the training utterance during a second time interval containing the corresponding first time interval, each second time interval being greater than or equal to the corresponding first time interval;
      
      providing a network of elemental models corresponding to the training utterance;
      
      correlating the training feature vector signals in the series of training feature vector signals to the elemental models in the network of elemental models corresponding to the training utterance so that each training feature vector signal in the series of training feature vector signals corresponds to one elemental model in the network of elemental models corresponding to the training utterance;
      
      selecting a fundamental set of all training feature vector signals which correspond to all occurrences of a first elemental model in the network of elemental models corresponding to the training utterance;
      
      selecting at least first and second different subsets of the fundamental set of training feature vector signals to form a first label set of training feature vector signals;
      
      calculating centroid of the feature values of the training feature vector signals of each of the first and second subsets of the fundamental set; and
      
      storing a first prototype vector signal corresponding to the first label set of training feature vector signals, said first prototype vector signal representing a first prototype vector having at least first and second partitions, each partition having at least one partition value, the first partition having a partition value equal to the value of the centroid of the feature values of the training feature vector signals in the first subset of the fundamental set, the second partition having a partition value equal to the value of the centroid of the feature values of the training feature vector signals in the second subset of the fundamental set.
  - 58. A method as claimed in claim 57, characterized in that the centroid is arithmetic average.
  - 59. A method as claimed in claim 58, characterized in that the network of elemental models is a series of elemental models.
  - 60. A method as claimed in claim 59, characterized in that:
    - the fundamental set of training feature vector signals is divided into at least first, second and third subsets of training feature vector signals;
      
      the calculating step further calculates the centroid of the feature values of the training feature vector signals in the third subset; and
      
      the method further comprises the step of storing a second prototype vector signal, said second prototype vector signal representing the value of the centroid of the feature values of the training feature vector signals in the third subset of the fundamental set.
  - 61. A method as claimed in claim 60, characterized in that:
    - the feature values of the training feature vector signals in each subset of the fundamental set have a feature value variance and a priori probability;
      
      the method further comprises the step of calculating the variance and a priori probability of the feature values of the training feature vector signals in each subset of the fundamental set;
      
      the first prototype signal represents the values of the variance and a priori probability of the feature values of the training feature vector signals in the first and second subsets of the fundamental set; and
      
      the second prototype signal represents the value of the variance and a priori probability of the feature values of the training feature vector signals in the third subset of the fundamental set.
  - 62. A method as claimed in claim 61, characterized in that:
    - the method further comprises the step of estimating conditional probability of occurrence of each subset of the fundamental set of training feature vector signals given the occurrence of the first label set;
      
      the method further comprises the step of estimating the probability of occurrence of the first label set of training feature vector signals;
      
      the first prototype vector further represents the estimated probability of occurrence of the first label set of training feature vector signals;
      
      the first partition of the first prototype vector has a further partition value equal to the estimated conditional probability of occurrence of the first subset of the fundamental set of training feature vector signals given the occurrence of the first label set; and
      
      the second partition of the first prototype vector has a further partition value equal to the estimated conditional probability of occurrence of the second subset of the fundamental set of training feature vector signals given the occurrence of the first label set.
  - 63. A method as claimed in claim 62, characterized in that:
    - each second time interval is equal to at least two first time intervals; and
      
      each feature vector signal comprises at least two feature values of the utterance at two different times.
  - 64. A method as claimed in claim 63, characterized in that:
    - each feature vector signal represents values of m features, where m is an integer greater than or equal to two;
      
      each partition has n partition values, where n is less than m; and
      
      the method further comprises the step of transforming the m values of each feature vector signal to n values prior to calculating the centroids and variance and a priori probability of the subsets.
  - 65. A method as claimed in claim 64, characterized in that:
    - the elemental models are elemental probabilistic models;
      
      the correlating step comprises the step of aligning the feature vector signals and the elemental probabilistic models.

66. An article for configuring a machine to perform a method of speech coding comprising the steps of:
- storing two or more prototype vector signals, each prototype vector signal representing a prototype vector having an identifier and at least two partitions, each partition having at least one partition value;
  
  using transducer means to measure a value of at least one feature of an utterance during a time interval to produce a feature vector signal representing the value of the at least one feature of the utterance;
  
  calculating a match score for each partition, each partition match score representing the value of a match between the partition value of the partition and the feature value of the feature vector signal;
  
  calculating a prototype match score for each prototype vector, each prototype match score representing a function of the partition match scores for all partitions in the prototype vector; and
  
  coding the feature vector signal with the identifier of the prototype vector signal having a best prototype match score.
- View Dependent Claims (67, 68, 69, 70, 71, 72, 73, 74, 75, 76)
- - 67. An article as claimed in claim 66, characterized in that:
    - each partition match score is proportional to the joint probability of occurrence of the feature value of the feature vector signal and the partition value of the partition; and
      
      the prototype match score represents the sum of the partition match score for all partitions in the prototype vector.
  - 68. An article as claimed in claim 67, further comprising a method of generating prototype vector signal, said prototype vector signal generating method comprising:
    - measuring the value of at least one feature of a training utterance during each of a series of successive first time intervals to produce a series of training feature vector signals, each training feature vector signal corresponding to a first time interval, each training feature vector signal representing the value of at least one feature of the training utterance during a second time interval containing the corresponding first time interval, each second time interval being greater than or equal to the corresponding first time interval;
      
      providing a network of elemental models corresponding to the training utterance;
      
      correlating the training feature vector signals in the series of training feature vector signals to the elemental models in the network of elemental models corresponding to the training utterance so that each training feature vector signal in the series of training feature vector signals corresponds to one elemental model in the network of elemental models corresponding to the training utterance;
      
      selecting a fundamental set of all training feature vector signals which corresponds to all occurrences of a first elemental model in the network of elemental models corresponding to the training utterance;
      
      selecting at least first and second different subsets of the fundamental set of training feature vector signals to form a first label set of training feature vector signals;
      
      calculating centroid of the feature values of the training feature vector signals of each of the first and second subsets of the fundamental set; and
      
      storing a first prototype vector signal corresponding to the first label set of training feature vector signals, said first prototype vector signal representing a first prototype vector having at least first and second partitions, each partition having at least one partition value, the first partition having a partition value equal to the value of the centroid of the feature values of the training feature vector signals in the first subset of the fundamental set, the second partition having a partition value equal to the value of the centroid of the feature values of the training feature vector signals in the second subset of the fundamental set.
  - 69. An article as claimed in claim 68, characterized in that the centroid is arithmetic average.
  - 70. An article as claimed in claim 69, characterized in that the network of elemental models is a series of elemental models.
  - 71. An article as claimed in claim 70, characterized in that:
    - the fundamental set of training feature vector signals is divided into at least first, second and third subsets of training feature vector signals;
      
      the calculating step further calculates the centroid of the feature values of the training feature vector signals in the third subset; and
      
      the method further comprises the step of storing a second prototype vector signal, said second prototype vector signal representing the value of the centroid of the feature values of the training feature vector signals in the third subset of the fundamental set.
  - 72. An article as claimed in claim 71, characterized in that:
    - the feature values of the training feature vector signals in each subset of the fundamental set have a feature value variance and a priori probability;
      
      the method further comprises the step of calculating the variance and a priori probability of the feature values of the training feature vector signals in each subset of the fundamental set;
      
      the first prototype signal represents the values of the variance and a priori probability of the feature values of the training feature vector signals in the first and second subsets of the fundamental set; and
      
      the second prototype signal represents the value of the variance and a priori probability of the feature values of the training feature vector signals in the third subset of the fundamental set.
  - 73. An article as claimed in claim 72, characterized in that:
    - the method further comprises the step of estimating conditional probability of occurrence of each subset of the fundamental set of training feature vector signals given the occurrence of the first label set;
      
      the method further comprises the step of estimating the probability of occurrence of the first label set of training feature vector signals;
      
      the first prototype vector further represents the estimated probability of occurrence of the first label set of training feature vector signals;
      
      the first partition of the first prototype vector has a further partition value equal to the estimated conditional probability of occurrence of the first subset of the fundamental set of training feature vector signals given the occurrence of the first label set; and
      
      the second partition of the first prototype vector has a further partition value equal to the estimated conditional probability of occurrence of the second subset of the fundamental set of training feature vector signals given the occurrence of the first label set.
  - 74. An article as claimed in claim 73, characterized in that:
    - each second time interval is equal to at least two first time intervals; and
      
      each feature vector signal comprises at least two feature values of the utterance at two different times.
  - 75. An article as claimed in claim 74, characterized in that:
    - each feature vector signal represents values of m features, where m is an integer greater than or equal to two;
      
      each partition has n partition values, where n is less than m; and
      
      the method further comprises the step of transforming the m values of each feature vector signal to n values prior to calculating the centroids and variance and a priori probability of the subsets.
  - 76. An article as claimed in claim 75, characterized in that:
    - the elemental models are elemental probabilistic models;
      
      the correlating step comprises the step of aligning the feature vector signals and the elemental probabilistic models.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
de Souza, Peter V., Nahamoo, David, Picheny, Michael A., Bahl, Lalit R.
Primary Examiner(s)
Shaw, Dale M.
Assistant Examiner(s)
Tung, Kee M.

Application Number

US07/673,810
Time in Patent Office

676 Days
Field of Search

381/41, 381/43, 381/29-35
US Class Current

704/222
CPC Class Codes

G10L 19/038 Vector quantisation, e.g. T...

H03M 7/3082 Vector coding for televisio...

Speaker-independent label coding apparatus

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

51 Citations

76 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker-independent label coding apparatus

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

51 Citations

76 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links