Method and apparatus for speaker identification using mixture discriminant analysis to develop speaker models

US 6,233,555 B1
Filed: 11/24/1998
Issued: 05/15/2001
Est. Priority Date: 11/25/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method of identifying a speaker from speakers in a group, comprising:

receiving a speaker'"'"'s utterance;

computing a sequence of a first set of feature vectors based on the received utterance;

transforming the first set of feature vectors into a second set of feature vectors using transformations specific to a particular segmentation unit;

computing likelihood scores of the second set of feature vectors using speaker models trained by mixture discriminate analysis using a collection of first sets of feature vectors from all the speakers in the group;

combining the likelihood scores to determine an utterance score;

validating the speaker'"'"'s identity based on the utterance score; and

outputting the validation results.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speaker identification system is provided that constructs speaker models using a discriminant analysis technique where the data in each class is modeled by Gaussian mixtures. The speaker identification method and apparatus determines the identity of a speaker, as one of a small group, based on a sentence-length password utterance. A speaker'"'"'s utterance is received and a sequence of a first set of feature vectors are computed based on the received utterance. The first set of feature vectors are then transformed into a second set of feature vectors using transformations specific to a particular segmentation unit, and likelihood scores of the second set of feature vectors are computed using speaker models trained using mixture discriminant analysis. The likelihood scores are then combined to determine an utterance score and the speaker'"'"'s identity is validated based on the utterance score. The speaker identification method and apparatus also includes training and enrollment phases. In the enrollment phase the speaker'"'"'s password utterance is received multiple times. A transcription of the password utterance as a sequence of phones is obtained, and the phone string is stored in a database containing phone strings of other speakers in the group. In the training phase, the first set of feature vectors are extracted from each password utterance and the phone boundaries for each phone in the password transcription are obtained using a speaker independent phone recognizer. A mixture model is developed for each phone of a given speaker'"'"'s password. Then, using the feature vectors from the password utterances of all of the speakers in the group, transformation parameters and transformed models are generated for each phone and speaker, using mixture discriminant analysis.

Citations

18 Claims

1. A method of identifying a speaker from speakers in a group, comprising:
- receiving a speaker'"'"'s utterance;
  
  computing a sequence of a first set of feature vectors based on the received utterance;
  
  transforming the first set of feature vectors into a second set of feature vectors using transformations specific to a particular segmentation unit;
  
  computing likelihood scores of the second set of feature vectors using speaker models trained by mixture discriminate analysis using a collection of first sets of feature vectors from all the speakers in the group;
  
  combining the likelihood scores to determine an utterance score;
  
  validating the speaker'"'"'s identity based on the utterance score; and
  
  outputting the validation results.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the second set of feature vectors represents a low-dimensional discriminant subspace.
  - 3. The method of claim 1, wherein the segmentation unit is a phone, a syllable, or an acoustic sub-word unit.
  - 4. The method of claim 1, further comprising a training phase, the training phase comprising:
5. The method of claim 1, further comprising an enrollment phase, the enrollment phase comprising:
- receiving the speaker'"'"'s password utterance multiple times;
  
  converting the speaker'"'"'s password utterance into a phone string; and
  
  storing the phone string in a database containing phone strings of other speakers in the group.
6. The method of claim 5, wherein the speaker'"'"'s password utterance is known.
7. The method of claim 5, wherein the speaker'"'"'s password utterance is not known.
8. The method of claim 1, wherein the utterance score is determined by averaging the likelihood scores.
9. The method of claim 1, wherein the utterance score is based on threshold scores generated from the likelihood scores.

10. An apparatus for identifying a speaker from speakers in a group, comprising:
- a speaker independent phone recognizer that receives a speaker'"'"'s utterance, computes a sequence of a first set of feature vectors based on the received utterance, and transforms the first set of feature vectors into a second set of feature vectors using transformations specific to a particular segmentation unit;
  
  a likelihood estimator that computes likelihood scores of the second set of feature vectors using speaker models trained by mixture discriminate analysis using a collection of first sets of feature vectors from all the speakers in the group;
  
  a score combiner that combines the likelihood scores to determine an utterance score; and
  
  a score analysis unit that validates the speaker'"'"'s identity based on the utterance score and outputs the validation results.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The apparatus of claim 10, wherein the second set of feature vectors represents a low-dimensional discriminant subspace.
  - 12. The apparatus of claim 10, wherein the segmentation unit is a phone, a syllable, or an acoustic sub-word unit.
  - 13. The apparatus of claim 10, further comprising a training phase, wherein the speaker independent phone recognizer extracts the collection of first sets of feature vectors from a password utterance and obtaining phone segments from all speakers in the group, the apparatus further comprising:
14. The apparatus of claim 10, further comprising an enrollment phase, wherein tie speaker independent phone recognizer receives the speaker'"'"'s password utterance multiple times, converts the speaker'"'"'s password utterance into a phone string, and stores the phone string in a database containing phone strings of other speakers in the group.
15. The apparatus of claim 14, wherein the speaker'"'"'s password utterance is known.
16. The apparatus of claim 14, wherein the speaker'"'"'s password utterance is not known.
17. The apparatus of claim 10, wherein the score combiner determines the utterance score by averaging the likelihood scores.
18. The apparatus of claim 10, further comprising a threshold unit, wherein:
- the score combiner determines the utterance score based on threshold scores generated from the likelihood scores by the threshold unit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Parthasarathy, Sarangarajan, Rosenberg, Aaron E.
Primary Examiner(s)
Hudspeth, David
Assistant Examiner(s)
Storm, Donald L.

Application Number

US09/198,579
Time in Patent Office

903 Days
Field of Search

704/246-250, 704/273, 379/88.02
US Class Current

704/249
CPC Class Codes

G10L 17/04 Training, enrolment or mode...

G10L 17/24 the user being prompted to ...

Method and apparatus for speaker identification using mixture discriminant analysis to develop speaker models

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for speaker identification using mixture discriminant analysis to develop speaker models

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links