Method and apparatus for training a text independent speaker recognition system using speech data with text labels

US 20060111905A1
Filed: 11/22/2004
Published: 05/25/2006
Est. Priority Date: 11/22/2004
Status: Active Grant

First Claim

Patent Images

1. An apparatus for providing a Text Independent (TI) speaker recognition mode in one of a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and a Text Constrained (TC) HMM speaker recognition system, comprising:

a Gaussian Mixture Model (GMM) generator for creating a GMM by pooling Gaussians from a plurality of HMM states; and

a Gaussian weight normalizer for normalizing Gaussian weights with respect to the plurality of HMM states.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.

43 Citations

View as Search Results

27 Claims

1. An apparatus for providing a Text Independent (TI) speaker recognition mode in one of a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and a Text Constrained (TC) HMM speaker recognition system, comprising:
- a Gaussian Mixture Model (GMM) generator for creating a GMM by pooling Gaussians from a plurality of HMM states; and
  
  a Gaussian weight normalizer for normalizing Gaussian weights with respect to the plurality of HMM states.
- View Dependent Claims (2, 3, 4)
- - 2. The apparatus of claim 1, wherein said GMM generator, in creating the GMM, at least one of derives a GMM Universal Background Model (UBM) from an HMM UBM and derives a speaker specific GMM from a speaker specific HMM.
  - 3. The apparatus of claim 1, wherein said Gaussian weight normalizer one of normalizes the Gaussian weights based on durations of the plurality of HMM states and normalizes the Gaussian weights by dividing each of the Gaussian weights by an overall number of the plurality of HMM states.
  - 4. The apparatus of claim 1, wherein the one of the TD HMM speaker recognition system and the TC HMM speaker recognition system is based on one of a single HMM per word, a single HMM per phonetic unit, and a single HMM per sub-phonetic unit.

5. An apparatus for providing one of a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition mode and a Text Constrained (TC) HMM speaker recognition mode in a Text Independent (TI) Gaussian Mixture Model (GMM) speaker recognition system, comprising:
- an HMM generator for creating an HMM by assigning states to Gaussians from a GMM; and
  
  a probability and weight calculator for calculating state transition probabilities and Gaussian weights with respect to a plurality of HMM states.
- View Dependent Claims (6, 7, 8)
- - 6. The apparatus of claim 5, wherein said HMM generator, in creating the HMM, at least one of derives a HMM Universal Background Model (UBM) from a GMM UBM and derives a speaker specific HMM from a speaker specific GMM.
  - 7. The apparatus of claim 5, wherein said probability and weight calculator calculates the state transition probabilities and the Gaussian weights one of based on durations of the plurality of HMM states and based on only an overall number of the plurality of HMM states.
  - 8. The apparatus of claim 5, wherein the one of the Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition mode and the Text Constrained (TC) HMM speaker recognition mode is based on one of a single HMM per word, a single HMM per phonetic unit, and a single HMM per sub-phonetic unit.

9. An apparatus for providing one of a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition mode and a Text Constrained (TC) HMM speaker recognition mode in another one of a TD HMM speaker recognition system and a TC HMM speaker recognition system, comprising:
- an HMM generator for creating an HMM with one of a smaller number of states and a larger number of states by one of pooling Gaussians from a plurality of HMM states into a single HMM state and splitting the Gaussians from the plurality of HMM states into different HMM states, respectively; and
  
  a Gaussian weight normalizer for normalizing Gaussian weights with respect to the HMM states.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The apparatus of claim 9, wherein said HMM generator, in creating the HMM, at least one of derives an HMM Universal Background Model (UBM) from another HMM UBM and derives a speaker specific HMM from another speaker specific HMM.
  - 11. The apparatus of claim 9, wherein said Gaussian weight normalizer one of normalizes the Gaussian weights based on HMM state durations and normalizes the Gaussian weights by dividing each of the Gaussian weights by an overall number of the HMM states.
  - 12. The apparatus of claim 9, wherein the other one of the TD HMM speaker recognition system and the TC HMM speaker recognition system includes one of a single HMM per phoneme and a single HMM per phonetic unit, and the one of the TD HMM speaker recognition mode and the TC HMM speaker recognition mode includes a single HMM per word.
  - 13. The apparatus of claim 9, wherein the other one of the TD HMM speaker recognition system and the TC HMM speaker recognition system includes a single HMM per word and the one of the TD HMM speaker recognition mode and the TC HMM speaker recognition mode includes a single HMM per phrase.

14. A method for providing a Text Independent (TI) speaker recognition mode in one of a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and a Text Constrained (TC) HMM speaker recognition system, comprising the steps of:
- creating a Gaussian Mixture Model (GMM) by pooling Gaussians from a plurality of HMM states; and
  
  normalizing Gaussian weights with respect to the plurality of HMM states.
- View Dependent Claims (15, 16, 17)
- - 15. The method of claim 14, wherein said creating step comprises at least one of the steps of deriving a GMM Universal Background Model (UBM) from an HMM UBM and deriving a speaker specific GMM from a speaker specific HMM.
  - 16. The method of claim 14, wherein said normalizing step one of normalizes the Gaussian weights based on durations of the plurality of HMM states and normalizes the Gaussian weights by dividing each of the Gaussian weights by an overall number of the plurality of HMM states.
  - 17. The method of claim 14, wherein the one of the TD HMM speaker recognition system and the TC HMM speaker recognition system is based on one of a single HMM per word, a single HMM per phonetic unit, and a single HMM per sub-phonetic unit.

18. A method for providing one of a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition mode and a Text Constrained (TC) HMM speaker recognition mode in a Text Independent (TI) Gaussian Mixture Model (GMM) speaker recognition system, comprising the steps of:
- creating an HMM by assigning states to Gaussians from a GMM; and
  
  calculating state transition probabilities and Gaussian weights with respect to a plurality of HMM states.
- View Dependent Claims (19, 20, 21)
- - 19. The method of claim 18, wherein said creating step comprises at least one of the steps of deriving a HMM Universal Background Model (UBM) from a GMM UBM and deriving a speaker specific HMM from a speaker specific GMM.
  - 20. The method of claim 18, wherein said calculating step calculates the state transition probabilities and the Gaussian weights one of based on durations of the plurality of HMM states and based on only an overall number of the plurality of HMM states.
  - 21. The method of claim 18, wherein the one of the Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition mode and the Text Constrained (TC) HMM speaker recognition mode is based on one of a single HMM per word, a single HMM per phonetic unit, and a single HMM per sub-phonetic unit.

22. A method for providing one of a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition mode and a Text Constrained (TC) HMM speaker recognition mode in another one of a TD HMM speaker recognition system and a TC HMM speaker recognition system, comprising the steps of:
- creating an HMM with one of a smaller number of states and a larger number of states by one of pooling Gaussians from a plurality of HMM states into a single HMM state and splitting the Gaussians from the plurality of HMM states into different HMM states, respectively; and
  
  normalizing Gaussian weights with respect to the HMM states.
- View Dependent Claims (23, 24, 25, 26, 27)
- - 23. The method of claim 22, wherein said creating step comprises at least one of the steps of deriving an HMM Universal Background Model (UBM) from another HMM UBM and deriving a speaker specific HMM from another speaker specific HMM.
  - 24. The method of claim 22, wherein said normalizing step one of normalizes the Gaussian weights based on HMM state durations and normalizes the Gaussian weights by dividing each of the Gaussian weights by an overall number of the HMM states.
  - 25. The method of claim 22, wherein the other one of the TD HMM speaker recognition system and the TC HMM speaker recognition system includes one of a single HMM per phoneme and a single HMM per phonetic unit, and the one of the TD HMM speaker recognition mode and the TC HMM speaker recognition mode includes a single HMM per word.
  - 26. The method of claim 22, wherein the other one of the TD HMM speaker recognition system and the TC HMM speaker recognition system includes a single HMM per word and the one of the TD HMM speaker recognition mode and the TC HMM speaker recognition mode includes a single HMM per phrase.
  - 27. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for providing one of the TD HMM speaker recognition mode and the TC HMM speaker recognition mode in the other one of the TD HMM speaker recognition system and the TC HMM speaker recognition system as recited in claim 22.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Navratil, Jiri, Pelecanos, Jason W., Ramaswamy, Ganesh N., Zilca, Ran D., Nealand, James H.

Granted Patent

US 7,447,633 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/256.700
CPC Class Codes

G10L 15/063 Training

G10L 15/144 Training of HMMs

Method and apparatus for training a text independent speaker recognition system using speech data with text labels

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

43 Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for training a text independent speaker recognition system using speech data with text labels

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

43 Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links