Maximum likelihood method for finding an adapted speaker model in eigenvoice space
First Claim
1. A method for performing speaker adaptation comprising the steps of:
- constructing an eigenspace to represent a plurality of training speakers by providing a set of models for said training speakers, expressing said set of models as supervectors of a first predetermined dimension, and performing principal component analysis upon said supervectors to generate a set of principal component vectors of a second predetermined dimension substantially lower than said first predetermined dimension that define said eigenspace;
generating an adapted model, using input speech from a new speaker to generate a maximum likelihood vector and to train said adapted model, while using said set of principal component vectors and said maximum likelihood vector to constrain said adapted model such that said adapted model lies within said eigenspace.
2 Assignments
0 Petitions
Accused Products
Abstract
A set of speaker dependent models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Principle component analysis is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space based on a maximum likelihood estimation. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. Environmental adaptation may be performed by including environmental variations in the training data.
70 Citations
4 Claims
-
1. A method for performing speaker adaptation comprising the steps of:
-
constructing an eigenspace to represent a plurality of training speakers by providing a set of models for said training speakers, expressing said set of models as supervectors of a first predetermined dimension, and performing principal component analysis upon said supervectors to generate a set of principal component vectors of a second predetermined dimension substantially lower than said first predetermined dimension that define said eigenspace;
generating an adapted model, using input speech from a new speaker to generate a maximum likelihood vector and to train said adapted model, while using said set of principal component vectors and said maximum likelihood vector to constrain said adapted model such that said adapted model lies within said eigenspace. - View Dependent Claims (2, 3, 4)
defining an auxiliary function representing the probability of generating an observed datum for a predefined set of models, in which said input speech supplies said observed datum; and
maximizing said auxiliary function to find said maximum likelihood vector.
-
-
3. The method of claim 1 wherein said adapted model is constrained by multiplying said maximum likelihood vector with said principal component vectors.
-
4. The method of claim 2 wherein said maximizing step is performed by:
-
representing said maximum likelihood vector as a set of eigenvalue variables;
taking a first derivative of said auxiliary function with respect to said eigenvalue variables; and
solving for the corresponding values of said eigenvalue variables when said first derivative is equated to zero.
-
Specification