Computationally efficient method and apparatus for speaker recognition

US 6,772,119 B2
Filed: 12/10/2002
Issued: 08/03/2004
Est. Priority Date: 12/10/2002
Status: Expired due to Term

First Claim

Patent Images

1. A method for recognizing a speaker, comprising the steps of:

extracting features from one or more enrollment utterances to generate a target speaker model based on a sample covariance matrix generated from said extracted enrollment features;

extracting features from one or more test utterances to generate a test utterance model based on a sample covariance matrix generated from said extracted test features; and

computing a sphericity ratio to recognize said speaker, said sphericity ratio comparing said test utterance model to said target speaker model and to a background model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speaker recognition technique is provided that can operate within the memory and processing constraints of existing portable computing devices. A smaller memory footprint and computational efficiency are achieved using single Gaussian models for each enrolled speaker. During enrollment, features are extracted from one or more enrollment utterances from each enrolled speaker, to generate a target speaker model based on a sample covariance matrix. During a recognition phase, features are extracted from one or more test utterances to generate a test utterance model that is also based on the sample covariance matrix. A sphericity ratio is computed that compares the test utterance model to the target speaker model, as well as a background model. The sphericity ratio indicates how similar test utterance speech is to the speech used when the user was enrolled, as represented by the target speaker model, and how dissimilar the test utterance speech is from the background model.

Citations

23 Claims

1. A method for recognizing a speaker, comprising the steps of:
- extracting features from one or more enrollment utterances to generate a target speaker model based on a sample covariance matrix generated from said extracted enrollment features;
  
  extracting features from one or more test utterances to generate a test utterance model based on a sample covariance matrix generated from said extracted test features; and
  
  computing a sphericity ratio to recognize said speaker, said sphericity ratio comparing said test utterance model to said target speaker model and to a background model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein said one or more enrollment utterances are obtained in an acoustic environment that is substantially matched to an acoustic environment of said one or more test utterances.
  - 3. The method of claim 2, wherein said acoustic environment is associated with a portable computing device.
  - 4. The method of claim 1, wherein said sphericity ratio is used to recognize the identity of said speaker by comparing said sphericity ratio of a given speaker model to said sphericity ratios obtained against other speaker models.
  - 5. The method of claim 1, further comprising the step of receiving an asserted identity from said speaker, and wherein said sphericity ratio is used to recognize identity of said speaker by applying a threshold to said sphericity ratio to determine whether said speaker is a person associated with said asserted identity.
  - 6. The method of claim 1, wherein said sphericity ratio may be expressed as follows:
7. The method of claim 6, wherein said expression trace(CU inv(CS)) evaluates how dissimilar said one or more test utterances are from said one or more enrollment utterances.
8. The method of claim 6, wherein said expression trace(CU inv(CB)) evaluates how dissimilar said one or more test utterances are from said background model.
9. The method of claim 1, wherein said background model is obtained from utterances of a plurality of speakers.

10. A method for recognizing a speaker, comprising the steps of:
- extracting features from one or more enrollment utterances to generate a target speaker model based on a sample covariance matrix generated from said extracted enrollment features;
  
  extracting features from one or more test utterances to generate a test utterance model based on a sample covariance matrix generated from said extracted test features; and
  
  computing a normalized similarity score that compares said test utterance model to said target speaker model and to a background model.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The method of claim 10, wherein said one or more enrollment utterances are obtained in an acoustic environment that is substantially matched to an acoustic environment of said one or more test utterances.
  - 12. The method of claim 11, wherein said acoustic environment is associated with a portable computing device.
  - 13. The method of claim 10, wherein said normalized similarity score is used to recognize the identity of said speaker by comparing said normalized similarity score of a given speaker model to said normalized similarity scores obtained against other speaker models.
  - 14. The method of claim 10, further comprising the step of receiving an asserted identity from said speaker, and wherein said normalized similarity score is used to recognize the identity of said speaker by applying a threshold to said normalized similarity score to determine whether said speaker is a person associated with said asserted identity.
  - 15. The method of claim 10, wherein said normalized similarity score may be expressed as follows:

16. An apparatus for recognizing a speaker, comprising:
- a memory; and
  
  at least one processor, coupled to the memory, operative to;
  
  extract features from one or more enrollment utterances to generate a target speaker model based on a sample covariance matrix generated from said extracted enrollment features;
  
  extract features from one or more test utterances to generate a test utterance model based on a sample covariance matrix generated from said extracted test features; and
  
  compute a sphericity ratio to recognize said speaker, said sphericity ratio comparing said test utterance model to said target speaker model and to a background model.
- View Dependent Claims (17, 18, 19, 20, 21, 22)
- - 17. The apparatus of claim 16, wherein said one or more enrollment utterances are obtained in an acoustic environment that is substantially matched to an acoustic environment of said one or more test utterances.
  - 18. The apparatus of claim 17, wherein said acoustic environment is associated with a portable computing device.
  - 19. The apparatus of claim 16, wherein said sphericity ratio is used to recognize the identity of said speaker by comparing said sphericity ratio of a given speaker model to said sphericity ratios obtained against other speaker models.
  - 20. The apparatus of claim 16, further comprising the step of receiving an asserted identity from said speaker, and wherein said sphericity ratio is used to recognize the identity of said speaker by applying a threshold to said sphericity ratio to determine whether said speaker is a person associated with said asserted identity.
  - 21. The apparatus of claim 16, wherein said sphericity ratio may be expressed as follows:
22. The apparatus of claim 16, wherein said background model is obtained from utterances of a plurality of speakers.

23. An article of manufacture for recognizing a speaker, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
- extracting features from one or more enrollment utterances to generate a target speaker model based on a sample covariance matrix generated from said extracted enrollment features;
  
  extracting features from one or more test utterances to generate a test utterance model based on a sample covariance matrix generated from said extracted test features; and
  
  computing a sphericity ratio to recognize said speaker, said sphericity ratio comparing said test utterance model to said target speaker model and to a background model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Ramaswamy, Ganesh N., Chaudhari, Upendra V., Zilca, Ran
Primary Examiner(s)
Knepper, David D.

Application Number

US10/315,648
Publication Number

US 20040111261A1
Time in Patent Office

602 Days
Field of Search

704/226-228, 704/233-236, 704/203, 704/204, 704/269, 704/246-248
US Class Current

704/246
CPC Class Codes

G10L 17/08 Use of distortion metrics o...

Computationally efficient method and apparatus for speaker recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Computationally efficient method and apparatus for speaker recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links