Dimensionality reduction for speaker normalization and speaker and environment adaptation using eigenvoice techniques
First Claim
1. A method for performing speaker adaptation or normalization comprising the steps of:
- constructing an eigenspace to represent a plurality of training speakers by providing a set of models for said training speakers and performing dimensionality reduction upon said set of models to generate a set of basis vectors that define said eigenspace;
generating an adapted model, using input speech from a new speaker to train said adapted model, while using said set of basis vectors to constrain said adapted model such that said adapted model lies within said eigenspace.
2 Assignments
0 Petitions
Accused Products
Abstract
A set of speaker dependent models or adapted models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Dimensionality reduction is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space based on a maximum likelihood estimation. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. The adapted model may then be further adapted via MAP, MLLR, MLED or the like. The eigenvoice technique may be applied to MLLR transformation matrices or the like; Bayesian estimation performed in eigenspace uses prior knowledge about speaker space density to refine the estimate about the location of a new speaker in eigenspace.
297 Citations
32 Claims
-
1. A method for performing speaker adaptation or normalization comprising the steps of:
-
constructing an eigenspace to represent a plurality of training speakers by providing a set of models for said training speakers and performing dimensionality reduction upon said set of models to generate a set of basis vectors that define said eigenspace;
generating an adapted model, using input speech from a new speaker to train said adapted model, while using said set of basis vectors to constrain said adapted model such that said adapted model lies within said eigenspace. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for performing speaker adaptation or normalization comprising the steps of:
-
constructing an eigenspace to represent a plurality of training speakers by providing a set of models for said training speakers and performing dimensionality reduction upon said set of models to generate a set of basis vectors that define said eigenspace;
generating an adapted model, using input speech from a new speaker to find a maximum likelihood vector in eigenspace defining said adapted model such that said adapted model lies within said eigenspace. - View Dependent Claims (11, 12, 13)
defining a probability function representing the probability of generating an observed datum for a predefined set of models, in which said input speech supplies said observed datum; and
maximizing said probability function to find said maximum likelihood vector.
-
-
12. The method of claim 10 wherein said adapted model is derived from the maximum likelihood vector by multiplying maximum likelihood vector coefficients by said basis vectors.
-
13. The method of claim 12 wherein said maximizing step is performed by:
-
representing said maximum likelihood vector as a set of eigenvalue variables;
taking a first derivative of said probability function with respect to said eigenvalue variables; and
solving for the corresponding values of said eigenvalue variables when said first derivative is equated to zero.
-
-
14. A method for performing speaker adaptation or normalization comprising the steps of:
-
representing a plurality of training speakers as a set of speaker models, said models defining a plurality of parameters;
enhancing said speaker models by adjusting at least some of said parameters of said models to define a set of enhanced speaker models;
constructing an eigenspace to represent said plurality of training speakers by performing dimensionality reduction upon said set of enhanced models to generate a set of basis vectors that define said eigenspace;
generating an adapted model, using input speech from a new speaker to train said adapted model, while using said set of basis vectors to constrain said adapted model such that said adapted model lies within said eigenspace. - View Dependent Claims (15, 16, 17, 18)
-
-
19. A method for performing speaker adaptation or normalization comprising the steps of:
-
constructing an eigenspace to represent a plurality of training speakers by providing a set of models for said training speakers and performing dimensionality reduction upon said set of models to generate a set of basis vectors that define said eigenspace;
generating an adapted model, using input speech from a new speaker to train said adapted model, while using said set of basis vectors to constrain said adapted model such that said adapted model lies within said eigenspace; and
enhancing said adapted model by extracting model parameters from said adapted model and adjusting at least some of said parameters based on said input speech from said new speaker. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26)
-
-
27. A method for performing speaker adaptation or normalization comprising the steps of:
-
representing a plurality of training speakers as first sets of transformation matrices together with a model to which the transformation matrices are applied;
constructing an eigenspace to represent said plurality of training speakers by performing dimensionality reduction upon said first sets of transformation matrices to generate a set of basis vectors that define said eigenspace;
generating a second set of transformation matrices using input speech from a new speaker while using said set of basis vectors to constrain said second set of transformation matrices such that said second set lies within said eigenspace. - View Dependent Claims (28, 29, 30)
-
-
31. A method for performing speaker adaptation or normalization comprising the steps of:
-
constructing an eigenspace to represent a plurality of training speakers by providing a set of first models for said training speakers and performing dimensionality reduction upon said set of first models to generate a set of basis vectors that define said eigenspace;
generating an adapted model using input speech from a new speaker to train said adapted model, while using said set of basis vectors to constrain said adapted model such that said adapted model lies within said eigenspace, wherein said first models define a first probability distribution and said input speech defines observation data and wherein said adapted model is generated such that the product of said observation data and said first probability distribution is maximized. - View Dependent Claims (32)
-
Specification