Method and apparatus for training a text independent speaker recognition system using speech data with text labels

US 7,813,927 B2
Filed: 06/04/2008
Issued: 10/12/2010
Est. Priority Date: 11/22/2004
Status: Active Grant

First Claim

Patent Images

1. A Hidden Markov Model (HMM) speaker recognition system, comprising:

at least one data storage unit; and

at least one processor programmed to;

create a Gaussian Mixture Model (GMM) by pooling Gaussians from a plurality of HMM states; and

normalize Gaussian weights with respect to the plurality of HMM states to provide a Text Independent (TI) speaker recognition mode in the HMM speaker recognition system;

wherein the HMM speaker recognition system is selected from a group consisting of a Text Dependent (TD) HMM speaker recognition system and a Text Constrained (TC) speaker recognition system.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.

Citations

13 Claims

1. A Hidden Markov Model (HMM) speaker recognition system, comprising:
- at least one data storage unit; and
  
  at least one processor programmed to;
  
  create a Gaussian Mixture Model (GMM) by pooling Gaussians from a plurality of HMM states; and
  
  normalize Gaussian weights with respect to the plurality of HMM states to provide a Text Independent (TI) speaker recognition mode in the HMM speaker recognition system;
  
  wherein the HMM speaker recognition system is selected from a group consisting of a Text Dependent (TD) HMM speaker recognition system and a Text Constrained (TC) speaker recognition system.
- View Dependent Claims (2, 3, 4)
- - 2. The system of claim 1, wherein said at least one processor is further programmed to:
    - in creating the GMM, derive a GMM Universal Background Model (UBM) from an HMM UBM and/or derive a speaker specific GMM from a speaker specific HMM.
  - 3. The system of claim 1, wherein said at least one processor is further programmed to:
    - normalize the Gaussian weights based on durations of the plurality of HMM states and/or normalize the Gaussian weights by dividing each of the Gaussian weights by an overall number of the plurality of HMM states.
  - 4. The system of claim 1, wherein the HMM speaker recognition system is based on a single HMM per word, a single HMM per phonetic unit, and/or a single HMM per sub-phonetic unit.

5. A Text Independent (TI) Gaussian Mixture Model (GMM) speaker recognition system, comprising:
- at least one data storage unit; and
  
  at least one processor programmed to;
  
  create a Hidden Markov Model (HMM) by assigning states to Gaussians from a GMM; and
  
  calculate state transition probabilities and Gaussian weights with respect to a plurality of HMM states to provide a Text Dependent (TD) HMM speaker recognition mode and/or a Text Constrained (TC) HMM speaker recognition mode in the TI GMM speaker recognition system.
- View Dependent Claims (6, 7, 8)
- - 6. The system of claim 5, wherein said at least one processor is further programmed to:
    - in creating the HMM, derive a HMM Universal Background Model (UBM) from a GMM UBM and/or derive a speaker specific HMM from a speaker specific GMM.
  - 7. The system of claim 5, wherein said at least one processor is further programmed to:
    - calculate the state transition probabilities and the Gaussian weights based on durations of the plurality of HMM states and/or based on only an overall number of the plurality of HMM states.
  - 8. The system of claim 5, wherein the Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition mode and/or the Text Constrained (TC) HMM speaker recognition mode is based on a single HMM per word, a single HMM per phonetic unit, and/or a single HMM per sub-phonetic unit.

9. A first Hidden Markov Model (HMM) speaker recognition system, comprising:
- at least one data storage unit; and
  
  at least one processor programmed to;
  
  create an HMM with a smaller number of states or a larger number of states by pooling Gaussians from a plurality of HMM states into a single HMM state or splitting the Gaussians from the plurality of HMM states into different HMM states, respectively; and
  
  normalize Gaussian weights with respect to the HMM states to provide a TD HMM speaker recognition mode and/or a TC HMM speaker recognition mode in a second HMM speaker recognition system;
  
  wherein the first HMM speaker recognition system is selected from a group consisting of a Text Dependent (TD) HMM speaker recognition system and a Text Constrained (TC) speaker recognition system.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The system of claim 9, wherein said at least one processor is further programmed to:
    - in creating the HMM, derive a first HMM Universal Background Model (UBM) from a second HMM UBM and/or derive a first speaker specific HMM from a second speaker specific HMM.
  - 11. The system of claim 9, wherein said at least one processor is further programmed to:
    - normalize the Gaussian weights based on HMM state durations and/or normalize the Gaussian weights by dividing each of the Gaussian weights by an overall number of the HMM states.
  - 12. The system of claim 9, wherein the second HMM speaker recognition system includes a single HMM per phoneme and/or a single HMM per phonetic unit, and the TD HMM speaker recognition mode and/or the TC HMM speaker recognition mode includes a single HMM per word.
  - 13. The system of claim 9, wherein the second HMM speaker recognition system includes a single HMM per word and the TD HMM speaker recognition mode and/or the TC HMM speaker recognition mode includes a single HMM per phrase.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Nealand, James H., Navratil, Jiri, Pelecanos, Jason W., Ramaswamy, Ganesh N., Zilca, Ran D.
Primary Examiner(s)
Dorvil; Richemond
Assistant Examiner(s)
GODBOLD, DOUGLAS

Application Number

US12/132,770
Publication Number

US 20080235020A1
Time in Patent Office

860 Days
Field of Search

704246-250, 704/273
US Class Current

704/250
CPC Class Codes

G10L 15/063 Training

G10L 15/144 Training of HMMs

Method and apparatus for training a text independent speaker recognition system using speech data with text labels

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for training a text independent speaker recognition system using speech data with text labels

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links