Dimensionality reduction for speaker normalization and speaker and environment adaptation using eigenvoice techniques

US 6,343,267 B1
Filed: 09/04/1998
Issued: 01/29/2002
Est. Priority Date: 04/30/1998
Status: Expired due to Term

First Claim

Patent Images

1. A method for performing speaker adaptation or normalization comprising the steps of:

constructing an eigenspace to represent a plurality of training speakers by providing a set of models for said training speakers and performing dimensionality reduction upon said set of models to generate a set of basis vectors that define said eigenspace;

generating an adapted model, using input speech from a new speaker to train said adapted model, while using said set of basis vectors to constrain said adapted model such that said adapted model lies within said eigenspace.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A set of speaker dependent models or adapted models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Dimensionality reduction is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space based on a maximum likelihood estimation. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. The adapted model may then be further adapted via MAP, MLLR, MLED or the like. The eigenvoice technique may be applied to MLLR transformation matrices or the like; Bayesian estimation performed in eigenspace uses prior knowledge about speaker space density to refine the estimate about the location of a new speaker in eigenspace.

297 Citations

32 Claims

1. A method for performing speaker adaptation or normalization comprising the steps of:
- constructing an eigenspace to represent a plurality of training speakers by providing a set of models for said training speakers and performing dimensionality reduction upon said set of models to generate a set of basis vectors that define said eigenspace;
  
  generating an adapted model, using input speech from a new speaker to train said adapted model, while using said set of basis vectors to constrain said adapted model such that said adapted model lies within said eigenspace.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 wherein said dimensionality reduction is performed by concatenating a plurality of model parameters extracted from said set of models and by performing a linear transformation upon said model parameters.
  - 3. The method of claim 1 wherein said dimensionality reduction is performed by a transformation process selected from the group consisting of:
    - principal component analysis, linear discriminant analysis, factor analysis, independent component analysis and singular value decomposition.
  - 4. The method of claim 1 wherein said models for said training speakers define a plurality of model parameters and said step of constructing an eigenspace comprises concatenating said model parameters for said plurality of training speakers to construct a set of supervectors and performing a linear dimensionality reduction transformation upon said supervectors to thereby generate said basis vectors.
  - 5. The method of claim 4 wherein said models for each of said training speakers correspond to a set of different speech units and wherein each supervector is defined as a concatenation of model parameters corresponding to said speech units sorted in a predetermined order.
  - 6. The method of claim 4 wherein said model parameters are cepstral coefficients.
  - 7. The method of claim 1 wherein said step of performing dimensionality reduction generates a set of basis vectors equal in number to the number of training speakers.
  - 8. The method of claim 1 wherein said step of performing dimensionality reduction generates an ordered list of basis vectors and wherein said step of constructing an eigenspace includes discarding a predetermined portion of said ordered list to reduce the order of said eigenspace.
  - 9. The method of claim 1 wherein said step of constraining said speaker dependent model is performed by projecting said input speech into said eigenspace.

10. A method for performing speaker adaptation or normalization comprising the steps of:
- constructing an eigenspace to represent a plurality of training speakers by providing a set of models for said training speakers and performing dimensionality reduction upon said set of models to generate a set of basis vectors that define said eigenspace;
  
  generating an adapted model, using input speech from a new speaker to find a maximum likelihood vector in eigenspace defining said adapted model such that said adapted model lies within said eigenspace.
- View Dependent Claims (11, 12, 13)
- - 11. The method of claim 10 wherein said step of generating a maximum likelihood vector comprises:
12. The method of claim 10 wherein said adapted model is derived from the maximum likelihood vector by multiplying maximum likelihood vector coefficients by said basis vectors.
13. The method of claim 12 wherein said maximizing step is performed by:
- representing said maximum likelihood vector as a set of eigenvalue variables;
  
  taking a first derivative of said probability function with respect to said eigenvalue variables; and
  
  solving for the corresponding values of said eigenvalue variables when said first derivative is equated to zero.

14. A method for performing speaker adaptation or normalization comprising the steps of:
- representing a plurality of training speakers as a set of speaker models, said models defining a plurality of parameters;
  
  enhancing said speaker models by adjusting at least some of said parameters of said models to define a set of enhanced speaker models;
  
  constructing an eigenspace to represent said plurality of training speakers by performing dimensionality reduction upon said set of enhanced models to generate a set of basis vectors that define said eigenspace;
  
  generating an adapted model, using input speech from a new speaker to train said adapted model, while using said set of basis vectors to constrain said adapted model such that said adapted model lies within said eigenspace.
- View Dependent Claims (15, 16, 17, 18)
- - 15. The method of claim 14 wherein said enhancing step is performed using maximum a posteriori estimation.
  - 16. The method of claim 14 wherein said enhancing step is performed using a transformation-based estimation process.
  - 17. The method of claim 14 wherein said enhancing step is performed using maximum likelihood linear regression estimation.
  - 18. The method of claim 14 wherein said step of generating said adapted model comprises using input speech from said new speaker to generate a maximum likelihood vector and to train said adapted model, while using said set of basis vectors and said maximum likelihood vector to constrain said adapted model such that said adapted model lies within said eigenspace.

19. A method for performing speaker adaptation or normalization comprising the steps of:
- constructing an eigenspace to represent a plurality of training speakers by providing a set of models for said training speakers and performing dimensionality reduction upon said set of models to generate a set of basis vectors that define said eigenspace;
  
  generating an adapted model, using input speech from a new speaker to train said adapted model, while using said set of basis vectors to constrain said adapted model such that said adapted model lies within said eigenspace; and
  
  enhancing said adapted model by extracting model parameters from said adapted model and adjusting at least some of said parameters based on said input speech from said new speaker.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26)
- - 20. The method of claim 19 wherein said enhancing step is performed using maximum a posteriori estimation.
  - 21. The method of claim 19 wherein said enhancing step is performed using a transformation-based estimation process.
  - 22. The method of claim 19 wherein said enhancing step is performed using maximum likelihood linear regression estimation.
  - 23. The method of claim 19 wherein said step of generating said adapted model comprises using input speech from said new speaker to generate a maximum likelihood vector and to train said adapted model, while using said set of basis vectors and said maximum likelihood vector to constrain said adapted model such that said adapted model lies within said eigenspace.
  - 24. The method of claim 23 wherein said enhancing step is performed using maximum a posteriori estimation.
  - 25. The method of claim 23 wherein said enhancing step is performed using a transformation-based estimation process.
  - 26. The method of claim 23 wherein said enhancing step is performed using maximum likelihood linear regression estimation.

27. A method for performing speaker adaptation or normalization comprising the steps of:
- representing a plurality of training speakers as first sets of transformation matrices together with a model to which the transformation matrices are applied;
  
  constructing an eigenspace to represent said plurality of training speakers by performing dimensionality reduction upon said first sets of transformation matrices to generate a set of basis vectors that define said eigenspace;
  
  generating a second set of transformation matrices using input speech from a new speaker while using said set of basis vectors to constrain said second set of transformation matrices such that said second set lies within said eigenspace.
- View Dependent Claims (28, 29, 30)
- - 28. The method of claim 27 wherein said first sets of transformation matrices are generated by maximum likelihood linear regression.
  - 29. The method of claim 27 further comprising vectorizing each of said first sets of transformation matrices to define a set of supervectors and performing dimensionality reduction upon said supervectors to define said eigenspace.
  - 30. The method of claim 27 further comprising generating said second set of transformation matrices using input speech from a new speaker to generate a maximum likelihood vector using said maximum likelihood vector to determine a location within said eigenspace.

31. A method for performing speaker adaptation or normalization comprising the steps of:
- constructing an eigenspace to represent a plurality of training speakers by providing a set of first models for said training speakers and performing dimensionality reduction upon said set of first models to generate a set of basis vectors that define said eigenspace;
  
  generating an adapted model using input speech from a new speaker to train said adapted model, while using said set of basis vectors to constrain said adapted model such that said adapted model lies within said eigenspace, wherein said first models define a first probability distribution and said input speech defines observation data and wherein said adapted model is generated such that the product of said observation data and said first probability distribution is maximized.
- View Dependent Claims (32)
- - 32. The method of claim 31 further comprising applying a confidence factor to said first probability distribution and said second probability distribution to reflect how confidence in information provided by said distributions varies over time.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panasonic Intellectual Property Corporation of America (Panasonic Holdings Corporation)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Nguyen, Patrick, Junqua, Jean-Claude, Kuhn, Roland
Primary Examiner(s)
Tsang, Fan
Assistant Examiner(s)
Opsasnick, Michael N.

Application Number

US09/148,753
Time in Patent Office

1,243 Days
Field of Search

704/231, 704/258, 704/256, 704/236, 704/240
US Class Current

704/222
CPC Class Codes

G06F 18/2135 based on approximation crit...

G10L 15/07 to the speaker

Dimensionality reduction for speaker normalization and speaker and environment adaptation using eigenvoice techniques

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

297 Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Dimensionality reduction for speaker normalization and speaker and environment adaptation using eigenvoice techniques

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

297 Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links