Transformation and combination of hidden Markov models for speaker selection training

US 20060074657A1
Filed: 10/01/2004
Published: 04/06/2006
Est. Priority Date: 10/01/2004
Status: Active Grant

First Claim

Patent Images

1. A method of transforming and combining a plurality of models representing training speakers into a model for a test speaker, comprising:

selecting a set of cohort speakers from the training speakers;

transforming a plurality of models representing the cohort speakers based on speech data from the test speaker; and

combining the plurality of transformed models to form the model for the test speaker.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention is directed to a 3-stage adaptation framework based on speaker selection training. First a subset of cohort speakers is selected for a test speaker. Then cohort models are transformed to be closer to the test speaker. Finally the adapted model for the test speaker is obtained by combining these transformed cohort models. Combination weights as well as bias items can be adaptively learned from adaptation data.

22 Citations

View as Search Results

19 Claims

1. A method of transforming and combining a plurality of models representing training speakers into a model for a test speaker, comprising:
- selecting a set of cohort speakers from the training speakers;
  
  transforming a plurality of models representing the cohort speakers based on speech data from the test speaker; and
  
  combining the plurality of transformed models to form the model for the test speaker.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1 wherein selecting a set of cohort speaker comprises:
    - generating a model for each training speaker; and
      
      determining a similarity between the models for the training speakers and data from the test speaker.
  - 3. The method of claim 2 wherein selecting a set of cohort speakers comprises:
    - selecting as cohort speakers, training speakers that have models having a desired similarity to the data from the test speaker.
  - 4. The method of claim 3 wherein generating a model for each training speaker comprises:
    - generating a Gaussian Mixture Model for each of the plurality of training speakers.
  - 5. The method of claim 4 wherein generating a Gaussian Mixture Model comprises:
    - calculating a probability mixture component for each Gaussian Mixture Model; and
      
      performing Expectation-Maximization re-estimation.
  - 6. The method of claim 5 where the Gaussian Mixture model is calculated according to the following equation:
    - $b (O) = \sum_{k = 1}^{M} c_{k} G (O, μ_{k}, U_{k});$ where b(O) is an output probability of observation sequence O, ck is a weight for k-th mixture component, G is a Gaussian function with mean vector μ
      
      _kand covariance matrix U_k.
  - 7. The method of claim of claim 4 wherein the probability mixture component is calculated according to the following equation:
    - $p (k | o (t), Λ_{n}) = \frac{c_{k} G (o (t), μ_{k}, σ_{k}^{2})}{\sum_{i = 1}^{M} c_{i} G (o (t), μ_{i}, σ_{i}^{2})};$ where b(O) is an output probability of observation sequence O, c_kis a weight for k-th mixture component, G is a Gaussian function with mean vector P k and covariance matrix U_k.
  - 8. The method of claim 4 wherein determining a similarity component comprises:
    - providing adaptation data for the test speaker to each of the Gaussian Mixture Models for the training speakers; and
      
      calculating a probability likelihood for each GMM given the adaptation data for the test speaker.
  - 9. The method of claim 1 wherein transforming the plurality of models comprises:
    - receiving adaptation data for the test speaker;
      
      receiving model data for the set of cohort speakers; and
      
      adapting the model data for the set of cohort speakers based on the adaptation data for the test speaker.
  - 10. The method of claim 9 wherein adapting the model data of the cohort speakers is performed using a Maximum Likelihood Linear Regression (MLLR).
  - 11. The method of claim 1 wherein combining the models comprises:
    - determining a weight vector for each of the transformed models; and
      
      combining the models based on the weight vectors.
  - 12. The method of claim 11 wherein determining the weight vector is determined using a regression tree.
  - 13. The method of claim 11 wherein determining the weight vector is determined according to the following equation:
    - ${\overline{λ}}_{r} = \frac{\sum_{m = 1}^{M} \sum_{t = 1}^{T} γ_{m} (t, r)}{\sum_{r = 1}^{R} \sum_{m = 1}^{M} \sum_{t = 1}^{T} γ_{m} (t, r)}, r = 1, \dots, R;$ where {overscore (λ
      
      )}_ris the weight, and γ
      
      _m(t,r) γ
      
      _m(t,r) is a posterior probability of Gaussian m in cohort model r at time t.
  - 14. The method of claim 13 wherein the weight vector λ
    - is determined according to the following equation;
      
      $\underset{λ}{argmax} {\sum_{m = 1}^{M} \sum_{t = 1}^{T} \log p (o (t) | λ) + \log p (λ)};$ wherein p represents a probability of an argument, M is a number of mixture components and o(t) is the observation vector at time t.

15. A system for transforming and combining a plurality of models representing training speakers into a model for a test speaker, comprising:
- an adaptation component configured to receive a speech input that is generated from a plurality of training speakers;
  
  a Gaussian Mixture Model (GMM) generating component configured to generate Gaussian Mixture Models for each of the training speakers;
  
  a speaker selection component configured to select a cohort of speakers based on a relative probability of each of the training speakers of matching the test speaker;
  
  a model transformation component transforms the models for the cohort speakers to more closely match the test speaker; and
  
  a model combination component configured to combine the transformed models of the cohort speakers to more closely match that of the test speaker.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The system of claim 15 wherein the GMM generating component is configured to use the speech input to generate a GMM for each of the training speakers.
  - 17. The system of claim 15 wherein the speaker selection component is configured to receive GMMs for the training speakers, and adaptation data for the test speaker;
    - and wherein the speaker selection component is further configured to select the cohort speakers from those training speakers whose probability exceeds a threshold value.
  - 18. The system of claim 15 wherein the model transformation component is further configured to transform the models for the cohort speakers using Maximum Likelihood Linear Regression (MLLR).
  - 19. The system of claim 15 wherein the model combination component is configured to determine a weighting component for each transformed model in the cohort of speakers, and to combine the models based upon the determined weighting component.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Huang, Chao

Granted Patent

US 7,574,359 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/246
CPC Class Codes

G10L 15/07 to the speaker

Transformation and combination of hidden Markov models for speaker selection training

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

22 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Transformation and combination of hidden Markov models for speaker selection training

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

22 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links