Method and apparatus for speech recognition adapted to an individual speaker
First Claim
Patent Images
1. A speaker adaptive speech recognition system comprising:
- means for receiving a speech signal from a speaker;
feature extraction means for converting said speech signal into a data set of feature vectors;
means for storing a plurality of speaker independent models, said models initially having undetermined parameters;
a training engine for determining the parameters of said speaker independent models from a set of training data;
an adaptation engine capable of receiving speech data from a particular speaker and using said data from a particular speaker to determine the parameters of a plurality of transformations for transforming the parameters of said speaker independent models independently of a set of trained speaker dependent models said parameters once determined used to adapt a plurality of speaker independent models such that at least one speaker independent model may be adapted even where there is no speaker dependent data available for said at least one model using maximum likelihood techniques and generating a set of speaker adapted models; and
a recognition engine capable of using said speaker independent models and said speaker adapted models to recognize words from a set of unidentified feature vectors.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for automatic recognition of speech adapts to a particular speaker by using adaptation data to develop a transformation through which speaker independent models are transformed into speaker adapted models. The speaker adapted models are then used for speaker recognition and achieve better recognition accuracy than non-adapted models. In a further embodiment, the transformation-based adaptation technique is combined with a known Bayesian adaptation technique.
-
Citations
30 Claims
-
1. A speaker adaptive speech recognition system comprising:
-
means for receiving a speech signal from a speaker; feature extraction means for converting said speech signal into a data set of feature vectors; means for storing a plurality of speaker independent models, said models initially having undetermined parameters; a training engine for determining the parameters of said speaker independent models from a set of training data; an adaptation engine capable of receiving speech data from a particular speaker and using said data from a particular speaker to determine the parameters of a plurality of transformations for transforming the parameters of said speaker independent models independently of a set of trained speaker dependent models said parameters once determined used to adapt a plurality of speaker independent models such that at least one speaker independent model may be adapted even where there is no speaker dependent data available for said at least one model using maximum likelihood techniques and generating a set of speaker adapted models; and a recognition engine capable of using said speaker independent models and said speaker adapted models to recognize words from a set of unidentified feature vectors. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A speaker adaptive speech recognition system comprising:
-
a set of models representing selected subunits of speech, each model having associated with it a plurality of states and each state having associated with it a probability function, the probability functions having parameters determined from training data said training data derived from a plurality of speakers from a training population; means for collecting a set of speaker adaptation data from a particular speaker, said set of adaptation data comprising words of the speaker'"'"'s choice spoken by the speaker, said adaptation data not necessarily comprising all the states in the speaker independent models; means for determining an adaptation transformation using said speaker adaptation data and said models by evaluating how well adaptation features aligned to recognized states in said adaptation data are described by the corresponding states of said models and determining a transformation to improve the description by said models said adaptation transformation applicable to groups of models; means for applying said transformation to said complete set of speaker independent models to create a complete set of speaker adapted models said transformation applicable to adapt a plurality of speaker independent models such that at least one speaker independent model may be adapted even where there is no speaker dependent data available for said at least one mode; and means for using said speaker adapted models to recognize subsequent speech data from said speaker. - View Dependent Claims (8, 9, 10)
-
-
11. A speaker adaptive speech recognition system comprising:
-
a set of models representing selected subunits of speech, each model having associated with it a plurality of states and each state having associated with it a probability function, the probability functions having parameters determined from training data said training data derived from a plurality of speakers from a training population; means for collecting a set of speaker adaptation data from a particular speaker, said set of adaptation data comprising words of the speaker'"'"'s choice spoken by the speaker, said adaptation data not necessarily comprising all the states in the speaker independent models; means for determining an adaptation transformation using said speaker adaptation data and said models by evaluating how well adaptation features aligned to recognized states in said adaptation data are described by the corresponding states of said models and determining a transformation to improve the description by said models; means for applying said transformation to said complete set of speaker independent models to create a complete set of speaker adapted models; and means for using said speaker adapted models to recognize subsequent speech data from said speaker wherein the speaker independent probability functions are Gaussians mixtures having the form ##EQU14## and wherein the speaker adapted probability functions have the form ##EQU15## - View Dependent Claims (12)
-
-
13. A speaker adaptive speech recognition system comprising:
-
a set of models representing selected subunits of speech, each model having associated with it a plurality of states and each state having associated with it a probability function, the probability functions having parameters determined from training data; a plurality of codebooks, each codebook containing a set of simple probability functions;
each codebook associated with a plurality of said states, the probability function of each one of said states being a weighted sum of the simple probability functions stored in its associated codebook;means for collecting a set of speaker adaptation data from a particular speaker, said set of speaker adaptation data comprising words of the speaker'"'"'s choice spoken by the speaker; means for determining a transformation using said speaker adaptation data and said models, said transformation capable of transforming said models into a set of speaker adapted models; means for using said transformation on said set of models to create a set of speaker adapted models by applying a transformation derived from one state to the codebook associated with that state, thereby transforming the model for all other states associated with that codebook; and means for applying said speaker adapted models to subsequent speech data from said speaker. - View Dependent Claims (14, 15, 16)
-
-
17. A speaker adaptive speech recognition system comprising:
-
a set of models representing selected subunits of speech, each model having associated with it a plurality of states and each state having associated with it a probability function, the probability functions having parameters determined from training data; a plurality of codebooks, each codebook containing a set of simple probability functions;
each codebook associated with a plurality of said states, the probability function of each one of said states being a weighted sum of the simple probability functions stored in its associated codebook;means for collecting a set of speaker adaptation data from a particular speaker, said set of speaker adaptation data comprising words of the speaker'"'"'s choice spoken by the speaker; means for determining a transformation using said speaker adaptation data and said models, said transformation capable of transforming said models into a set of speaker adapted models; means for using said transformation on said set of models to create a set of speaker adapted models by applying a transformation derived from one state to the codebook associated with that state, thereby transforming the model for all other states associated with that codebook; and means for applying said speaker adapted models to subsequent speech data from said speaker wherein the speaker adapted probability functions have the form ##EQU20##
-
-
18. In a speech recognition system for responding to signals representative of digital speech, a method for developing models adapted to an individual speaker comprising the steps of:
-
selecting a multi-state model with state probability functions, said state probability functions being of a general form with initially undetermined parameters; creating an individual instance of a model for each subunit of speech to be processed; using training data from a plurality of speakers to determine acoustic features of states of said models and to estimate probability density functions for said models; clustering states based on their acoustic similarity; creating a plurality of cluster codebooks, one codebook for each cluster;
said cluster codebooks consisting of probability density functions that are shared by each cluster'"'"'s states;reestimating the probability densities of each cluster codebook and the parameters of the probability equations in each cluster; acquiring a set of speaker dependent training data from an individual speaker, said set consisting of words spoken at random by said individual speaker; recognizing said set of training data as being probably generated by particular states; using the results of said recognizing to determine the parameters of a transformation associated with a particular codebook; adapting the models in a particular codebook with a transformation having parameters estimated for that codebook to said particular speaker. - View Dependent Claims (19, 20)
-
-
21. A speaker adaptive speech recognizer, comprising:
-
a computer; storage means; a set of models for subunits of speech stored in the storage means; a feature extractor in the computer for extracting feature data capable of being processed by said computer from a speech signal; training means in the computer for training the models using features from identified samples of speech data and for producing a master codebook of probability density functions for use by the models; clustering means in the computer for identifying clusters of states that share subsets of the probability density functions in the codebooks; splitting and pruning means in the computer for producing cluster codebooks by splitting the master codebook into subsets of probability densities shared by clustered states; re-estimating means for retraining the models for the states in the clusters and for recalculating the probability densities in each cluster codebook; recognizing means for matching features from unidentified speech data to the models to produce a most likely path through the models where the path defines the most likely subunits and words in the speech data; and speaker adaptive means for adapting each cluster codebook to an individual speaker using a small amount of speaker adaptive data from said individual speaker to determination a transformation for said codebooks and applying said transformation to the parameters of models in said cookbooks.
-
-
22. A speaker adaptive speech recognition system comprising:
-
means for receiving a speech signal from a speaker; feature extraction means for converting said speech signal into a data set of feature vectors; means for storing a plurality of speaker independent models said models initially having undetermined parameters; training engine for determining the parameters of said speaker independent models from a set of training data; an adaptation engine capable of receiving speech data from a particular speaker for transforming the parameters of said speaker independent models independently of a set of trained speaker dependent models using maximum likelihood techniques and generating a set of speaker adapted models; and a recognition engine capable of using said speaker independent models and said speaker adapted models to recognize words from a set of unidentified feature vectors further comprising; means for storing a plurality of speaker dependent models trained by the training engine for an individual speaker; means for combining parameters of said speaker adapted models with parameters of said speaker dependent models to generate improved speaker adapted models. - View Dependent Claims (23)
-
-
24. A method of adapting initial acoustic observation models to new acoustical conditions comprising:
-
selecting a transformation to be applied to the parameters of said models; using condition specific data in order to determine parameters for said transformation; and applying the transformation using the transformation parameters to the parameters of the acoustic models in order to obtain adapted models, said transformation able to transform models for which no condition specific data is available. - View Dependent Claims (25, 26, 27, 28, 30)
-
-
29. In a recognition system for responding to signals representative of observed physical data, and method for associating input data with a particular adapted model comprising:
deriving a set of adapted models Psa (Xt |s) from an original set of trained models Psa (wi |s) by applying transformations T1 to means μ and
T2 to the covariance Σ
having form ##EQU22## and using said adapted models to recognize said signals representative of observed physical data.
Specification