Fast algorithm for deriving acoustic prototypes for automatic speech recognition

US 5,276,766 A
Filed: 07/16/1991
Issued: 01/04/1994
Est. Priority Date: 07/16/1991
Status: Expired due to Fees

First Claim

Patent Images

1. An apparatus for generating a set of acoustic prototype signals for encoding speech, said apparatus comprising:

means for storing a model of a training script, said training script model comprising a series of word-segment models, each word-segment model being selected from a finite set of word-segment models, each word-segment model comprising a series of elementary models, each elementary model having a location in each word-segment model, each elementary model being selected from a finite set of elementary models;

means for measuring the value of at least one feature of an utterance of the training script during each of a series of time intervals spanned by the utterance of the training script to produce a series of feature vector signals, each feature vector signal having a feature value representing the value of the at least one feature of the utterance during a corresponding time interval;

means for estimating at least one path through the training script model which would produce the entire series of measured feature vector signals so as to estimate, for each feature vector signal, the corresponding elementary model in the training script model which would produce that feature vector signal;

means for clustering the feature vector signals into a plurality of clusters to form a plurality of cluster signals, each feature vector signal in a cluster corresponding to a single elementary model in a single location in a single word-segment model, each cluster signal having a cluster value equal to an average of the feature values of all of the feature vector signals in the cluster;

means for storing a plurality of prototype vector signals, each prototype vector signal corresponding to an elementary model, each prototype vector signal having an identifier and comprising at least two partition values, at least one partition value being equal to a combination of the cluster values of one or more cluster signals corresponding to the elementary model, at least one other partition value being equal to a combination of the cluster values of one or more other cluster signals corresponding to the elementary model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus for generating a set of acoustic prototype signals for encoding speech includes a memory for storing a training script model comprising a series of word-segment models. Each word-segment model comprises a series of elementary models. An acoustic measure is provided for measuring the value of at least one feature of an utterance of the training script during each of a series of time intervals to produce a series of feature vector signals representing the feature values of the utterance. An acoustic matcher is provided for estimating at least one path through the training script model which would produce the entire series of measured feature vector signals. From the estimated path, the elementary model in the training script model which would produce each feature vector signal is estimated. The apparatus further comprises a cluster processor for clustering the feature vector signals into a plurality of clusters. Each feature vector signal in a cluster corresponds to a single elementary model in a single location in a single word-segment model. Each cluster signal has a cluster value equal to an average of the feature values of all feature vectors in the signal. Finally, the apparatus includes a memory for storing a plurality of prototype vector signals. Each prototype vector signal corresponds to an elementary model, has an identifier, and comprises at least two partition values. The partition values are equal to combinations of the cluster values of one or more cluster signals corresponding to the elementary model.

22 Citations

View as Search Results

14 Claims

1. An apparatus for generating a set of acoustic prototype signals for encoding speech, said apparatus comprising:
- means for storing a model of a training script, said training script model comprising a series of word-segment models, each word-segment model being selected from a finite set of word-segment models, each word-segment model comprising a series of elementary models, each elementary model having a location in each word-segment model, each elementary model being selected from a finite set of elementary models;
  
  means for measuring the value of at least one feature of an utterance of the training script during each of a series of time intervals spanned by the utterance of the training script to produce a series of feature vector signals, each feature vector signal having a feature value representing the value of the at least one feature of the utterance during a corresponding time interval;
  
  means for estimating at least one path through the training script model which would produce the entire series of measured feature vector signals so as to estimate, for each feature vector signal, the corresponding elementary model in the training script model which would produce that feature vector signal;
  
  means for clustering the feature vector signals into a plurality of clusters to form a plurality of cluster signals, each feature vector signal in a cluster corresponding to a single elementary model in a single location in a single word-segment model, each cluster signal having a cluster value equal to an average of the feature values of all of the feature vector signals in the cluster;
  
  means for storing a plurality of prototype vector signals, each prototype vector signal corresponding to an elementary model, each prototype vector signal having an identifier and comprising at least two partition values, at least one partition value being equal to a combination of the cluster values of one or more cluster signals corresponding to the elementary model, at least one other partition value being equal to a combination of the cluster values of one or more other cluster signals corresponding to the elementary model.
- View Dependent Claims (2, 3, 4, 5)
- - 2. An apparatus as claimed in claim 1, characterized in that the estimating means comprises means for estimating the most likely path through the training script model which would produce the entire series of measured feature vector signal so as to estimate, for each feature vector signal, the corresponding elementary model in the training script model which would most likely produce that feature vector signal.
  - 3. An apparatus as claimed in claim 2, characterized in that at least two different word-segment models contain at least a first elementary model.
  - 4. An apparatus as claimed in claim 3, characterized in that at least one word-segment model contains a first elementary model in at least two different locations in the word-segment model.
  - 5. An apparatus as claimed in claim 4, characterized in that each cluster signal has a cluster value equal to the average of the feature values of all of the feature vectors in the cluster, and equal to the variance of the feature values of all of the feature vectors in the cluster.

6. A method of generating a set of acoustic prototype signals for encoding speech, said method comprising the steps of:
- storing a model of a training script, said training script model comprising a series of word-segment models, each word-segment model being selected from a finite set of word-segment models, each word-segment model comprising a series of elementary models, each elementary model having a location in each word-segment model, each elementary model being selected from a finite set of elementary models;
  
  measuring the value of at least one feature of an utterance of the training script during each of a series of time intervals spanned by the utterance of the training script to produce a series of feature vector signals, each feature vector signal having a feature value representing the value of the at least one feature of the utterance during a corresponding time interval;
  
  estimating at least one path through the training script model which would produce the entire series of measured feature vector signals so as to estimate, for each feature vector signal, the corresponding elementary model in the training script model which would product that feature vector signal;
  
  clustering the feature vector signals into a plurality of clusters to form a plurality of cluster signals, each feature vector signal in a cluster corresponding to a single elementary model in a single location in a single word-segment mode, each cluster signal having a cluster value equal to an average of the feature values of all of the feature vector signals in the cluster;
  
  storing a plurality of prototype vector signals, each prototype vector signal corresponding to an elementary model, each prototype vector signal having an identifier and comprising at least two partition values, at least one partition value being equal to a combination of the cluster values of one or more cluster signals corresponding to the elementary model, at least one other partition value being equal to a combination of the cluster values of one or more other cluster signals corresponding to the elementary model.
- View Dependent Claims (7, 8, 9, 10)
- - 7. A method as claimed in claim 6, characterized in that the step of estimating comprises estimating the most likely path through the training script model which would produce the entire series of measured feature vector signals so as to estimate, for each feature vector signal, the corresponding elementary model in the training script model which would most likely produce that feature vector signal.
  - 8. A method as claimed in claim 7, characterized in that at least two different word-segment models contain at least a first elementary model.
  - 9. A method as claimed in claim 8, characterized in that at least one word-segment model contains a first elementary model in at least two different locations in the word-segment model.
  - 10. A method as claimed in claim 9, characterized in that each cluster signal has a cluster value equal to the average of the feature values of all of the feature values of all of the feature vectors in the cluster.

11. A speech recognition apparatus comprising:
- means for measuring the value of at least one feature of an utterance of a word to be recognized during each of a series of time intervals spanned by the utterance of the word to be recognized to produce a series of feature vector signals, each feature vector signal having a feature value representing the value of at least one feature of the utterance during a corresponding time interval;
  
  means for storing a set of a plurality of prototype vector signals, each prototype vector signal having an identifier and a prototype value;
  
  means for comparing the value of each feature vector signal to the prototype value of each prototype vector signal to identify the best matched prototype vector signal associated with each feature vector signal to produce a series of associated prototype vector identifier signals;
  
  means for storing a plurality of acoustic word models;
  
  means for comparing the series of associated prototype vector identifier signals with each of the acoustic word models to estimate the one or more words which most likely correspond to the series of associated prototype vector identifier signals; and
  
  a display for displaying at least one of the one or more words which most likely correspond to the series of associated prototype vector identifier signals;
  
  characterized in that the apparatus further comprises means for generating the set of prototype vector signals, said means for generating comprising;
  
  means for storing a model of a training script, said training script model comprising a series of word-segment models, each word-segment model being selected from a finite set of word-segment models, each word-segment model comprising a series of elementary models, each elementary model having a location in each word-segment model, each elementary model being selected from a finite set of elementary models;
  
  means for measuring the value of at least one featuyre of an utterance of the training script during each of a series of time intervals spanned by the utterance of the training script to produce a series of feature vector signals, each feature vector signal having a feature value representing the value of the at least one feature of the utterance during a corresponding time interval;
  
  means for estimating at least one path through the training script model which would produce the entire series of measured feature vector signals so as to estimate, for each feature vector signal, the corresponding elementary model in the training script model which would produce that feature vector signal;
  
  means for clustering the feature vector signals into a plurality of clusters to form a plurality of cluster signals, each feature vector signal in a cluster corresponding to a single elementary model in a single location in a single word-segment model, each cluster signal having a cluster value equal to an average of the feature values of all of the feature vector signals in the cluster;
  
  means for storing a plurality of prototype vector signals, each prototype vector signal corresponding to an elementary model, each prototype vector signal having an identifier and comprising at least two partition values, at least one partition value being equal to a combination of the cluster values of one or more cluster signals corresponding to the elementary model, at least one other partition value being equal to a combination of the cluster values of one or more other cluster signals corresponding to the elementary model.
- View Dependent Claims (12, 13, 14)
- - 12. An apparatus as claimed in claim 11, characterized in that the estimating means comprises means for estimating the most likely path through the training script model which would produce the entire series of measured feature vector signals so as to estimate, for each feature vector signal, the corresponding elementary model in the training script model which would most likely produce that feature vector signal.
  - 13. An apparatus as claimed in claim 11, characterized in that the means for comparing the value of each feature vector signal to the value of each prototype vector signal comprises:
    - means for comparing the value of each feature vector signal to the value of each partition of a prototype vector signal to produce a partition match score for each partition; and
      
      selecting the best partition match score for a prototype vector signal as a prototype match score for that prototype vector signal.
  - 14. An apparatus as claimed in claim 11, characterized in that the means for comparing the value of each feature vector signal to the value of each prototype vector signal comprises:
    - means for comparing the value of each feature vector signal to the value of each partition of a prototype vector signal to produce a partition match score for each partition; and
      
      combining the partition match scores for a prototype vector signal to produce a prototype match score for that prototype vector signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
DeSouza, Peter V., Bellegarda, Jerome R., Nahamoo, David, Picheny, Michael A., Bahl, Lalit R.
Primary Examiner(s)
Fleming, Michael R.
Assistant Examiner(s)
Doerrler, Michelle

Application Number

US07/730,714
Time in Patent Office

903 Days
Field of Search

381/41-45, 395/2.65
US Class Current

704/256.4
CPC Class Codes

G10L 15/063 Training

Fast algorithm for deriving acoustic prototypes for automatic speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

22 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Fast algorithm for deriving acoustic prototypes for automatic speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

22 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links