Method and apparatus for training a text independent speaker recognition system using speech data with text labels

US 7,447,633 B2
Filed: 11/22/2004
Issued: 11/04/2008
Est. Priority Date: 11/22/2004
Status: Active Grant

First Claim

Patent Images

1. A method, comprising the steps of:

providing a Text Independent (TI) speaker recognition mode in one of a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and a Text Constrained (TC) HMM speaker recognition system,wherein said providing step comprises;

creating a Gaussian Mixture Model (GMM) by pooling Gaussians from a plurality of HMM states; and

normalizing Gaussian weights with respect to the plurality of HMM states.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.

Citations

14 Claims

1. A method, comprising the steps of:
- providing a Text Independent (TI) speaker recognition mode in one of a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and a Text Constrained (TC) HMM speaker recognition system,wherein said providing step comprises;
  
  creating a Gaussian Mixture Model (GMM) by pooling Gaussians from a plurality of HMM states; and
  
  normalizing Gaussian weights with respect to the plurality of HMM states.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein said creating step comprises at least one of the steps of deriving a GMM Universal Background Model (UBM) from an HMM UBM and deriving a speaker specific GMM from a speaker specific HMM.
  - 3. The method of claim 1, wherein said normalizing step one of normalizes the Gaussian weights based on durations of the plurality of HMM states and normalizes the Gaussian weights by dividing each of the Gaussian weights by an overall number of the plurality of HMM states.
  - 4. The method of claim 1, wherein the one of the TD HMM speaker recognition system and the TC HMM speaker recognition system is based on one of a single HMM per word, a single HMM per phonetic unit, and a single HMM per sub-phonetic unit.

5. A method, comprising the steps of:
- providing one of a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition mode and a Text Constrained (TC) HMM speaker recognition mode in a Text Independent (TI) Gaussian Mixture Model (GMM) speaker recognition system,wherein said providing step comprises;
  
  creating an HMM by assigning states to Gaussians from a GMM; and
  
  calculating state transition probabilities and Gaussian weights with respect to a plurality of HMM states.
- View Dependent Claims (6, 7, 8)
- - 6. The method of claim 5, wherein said creating step comprises at least one of the steps of deriving a HMM Universal Background Model (UBM) from a GMM UBM and deriving a speaker specific HMM from a speaker specific GMM.
  - 7. The method of claim 5, wherein said calculating step calculates the state transition probabilities and the Gaussian weights one of based on durations of the plurality of HMM states and based on only an overall number of the plurality of HMM states.
  - 8. The method of claim 5, wherein the one of the Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition mode and the Text Constrained (TC) HMM speaker recognition mode is based on one of a single HMM per word, a single HMM per phonetic unit, and a single HMM per sub-phonetic unit.

9. A method, comprising the steps of:
- providing one of a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition mode and a Text Constrained (TC) HMM speaker recognition mode in another one of a TD HMM speaker recognition system and a TC HMM speaker recognition system,wherein said providing step comprises;
  
  creating an HMM with one of a smaller number of states and a larger number of states by one of pooling Gaussians from a plurality of HMM states into a single HMM state and splitting the Gaussians from the plurality of HMM states into different HMM states, respectively; and
  
  normalizing Gaussian weights with respect to the HMM states.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The method of claim 9, wherein said creating step comprises at least one of the steps of deriving an HMM Universal Background Model (UBM) from another HMM UBM and deriving a speaker specific HMM from another speaker specific HMM.
  - 11. The method of claim 9, wherein said normalizing step one of normalizes the Gaussian weights based on HMM state durations and normalizes the Gaussian weights by dividing each of the Gaussian weights by an overall number of the HMM states.
  - 12. The method of claim 9, wherein the other one of the TD HMM speaker recognition system and the TC HMM speaker recognition system includes one of a single HMM per phoneme and a single HMM per phonetic unit, and the one of the TD HMM speaker recognition mode and the TC HMM speaker recognition mode includes a single HMM per word.
  - 13. The method of claim 9, wherein the other one of the TD HMM speaker recognition system and the TC HMM speaker recognition system includes a single HMM per word and the one of the TD HMM speaker recognition mode and the TC HMM speaker recognition mode includes a single HMM per phrase.
  - 14. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for providing one of the TD HMM speaker recognition mode and the TC HMM speaker recognition mode in the other one of the TD HMM speaker recognition system and the TC HMM speaker recognition system as recited in claim 9.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Navratil, Jiri, Pelecanos, Jason W., Ramaswamy, Ganesh N., Zilca, Ran D., Nealand, James H.
Primary Examiner(s)
Smits; Talivaldis Ivars
Assistant Examiner(s)
GODBOLD, DOUGLAS

Application Number

US10/994,743
Publication Number

US 20060111905A1
Time in Patent Office

1,443 Days
Field of Search

704246-250, 704/273
US Class Current

704/250
CPC Class Codes

G10L 15/063 Training

G10L 15/144 Training of HMMs

Method and apparatus for training a text independent speaker recognition system using speech data with text labels

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for training a text independent speaker recognition system using speech data with text labels

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links