Computationally efficient method and apparatus for speaker recognition
First Claim
1. A method for recognizing a speaker, comprising the steps of:
- extracting features from one or more enrollment utterances to generate a target speaker model based on a sample covariance matrix generated from said extracted enrollment features;
extracting features from one or more test utterances to generate a test utterance model based on a sample covariance matrix generated from said extracted test features; and
computing a sphericity ratio to recognize said speaker, said sphericity ratio comparing said test utterance model to said target speaker model and to a background model.
2 Assignments
0 Petitions
Accused Products
Abstract
A speaker recognition technique is provided that can operate within the memory and processing constraints of existing portable computing devices. A smaller memory footprint and computational efficiency are achieved using single Gaussian models for each enrolled speaker. During enrollment, features are extracted from one or more enrollment utterances from each enrolled speaker, to generate a target speaker model based on a sample covariance matrix. During a recognition phase, features are extracted from one or more test utterances to generate a test utterance model that is also based on the sample covariance matrix. A sphericity ratio is computed that compares the test utterance model to the target speaker model, as well as a background model. The sphericity ratio indicates how similar test utterance speech is to the speech used when the user was enrolled, as represented by the target speaker model, and how dissimilar the test utterance speech is from the background model.
-
Citations
23 Claims
-
1. A method for recognizing a speaker, comprising the steps of:
-
extracting features from one or more enrollment utterances to generate a target speaker model based on a sample covariance matrix generated from said extracted enrollment features;
extracting features from one or more test utterances to generate a test utterance model based on a sample covariance matrix generated from said extracted test features; and
computing a sphericity ratio to recognize said speaker, said sphericity ratio comparing said test utterance model to said target speaker model and to a background model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
7. The method of claim 6, wherein said expression trace(CU inv(CS)) evaluates how dissimilar said one or more test utterances are from said one or more enrollment utterances.
-
8. The method of claim 6, wherein said expression trace(CU inv(CB)) evaluates how dissimilar said one or more test utterances are from said background model.
-
9. The method of claim 1, wherein said background model is obtained from utterances of a plurality of speakers.
-
10. A method for recognizing a speaker, comprising the steps of:
-
extracting features from one or more enrollment utterances to generate a target speaker model based on a sample covariance matrix generated from said extracted enrollment features;
extracting features from one or more test utterances to generate a test utterance model based on a sample covariance matrix generated from said extracted test features; and
computing a normalized similarity score that compares said test utterance model to said target speaker model and to a background model. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. An apparatus for recognizing a speaker, comprising:
-
a memory; and
at least one processor, coupled to the memory, operative to;
extract features from one or more enrollment utterances to generate a target speaker model based on a sample covariance matrix generated from said extracted enrollment features;
extract features from one or more test utterances to generate a test utterance model based on a sample covariance matrix generated from said extracted test features; and
compute a sphericity ratio to recognize said speaker, said sphericity ratio comparing said test utterance model to said target speaker model and to a background model. - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
-
22. The apparatus of claim 16, wherein said background model is obtained from utterances of a plurality of speakers.
-
23. An article of manufacture for recognizing a speaker, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
-
extracting features from one or more enrollment utterances to generate a target speaker model based on a sample covariance matrix generated from said extracted enrollment features;
extracting features from one or more test utterances to generate a test utterance model based on a sample covariance matrix generated from said extracted test features; and
computing a sphericity ratio to recognize said speaker, said sphericity ratio comparing said test utterance model to said target speaker model and to a background model.
-
Specification