Background learning of speaker voices
First Claim
1. A method of automatically identifying a speaker;
- the method including;
identifying a speaker by;
receiving a test utterance from the speaker;
determining a most likely one of a plurality of speaker models for the test utterance; and
identifying the speaker associated with the most likely speaker model as the speaker of the test utterance;
wherein the method includes generating the plurality of speaker models in the background by;
receiving training utterances from a pool of utterances from the plurality of speakers in the background, without prior knowledge of the speakers who spoke the respective training utterances;
blind clustering of the training utterances from the pool of utterances based on a predetermined criterion, wherein the blind clustering includes calculating a likelihood vector for each training utterance; and
non-explicitly training for each of the clusters a corresponding speaker model, each of the models representing a speaker.
2 Assignments
0 Petitions
Accused Products
Abstract
A speaker identification system includes a speaker model generator 110 for generating a plurality of speaker models. To this end, the generator records training utterances from a plurality of speakers in the background, without prior knowledge of the speakers who spoke the utterances. The generator performs a blind clustering of the training utterances based on a predetermined criterion. For each of the clusters a corresponding speaker model is trained.
A speaker identifier 130 identifies a speaker determining a most likely one of the speaker models for an utterance received from the speaker. The speaker associated with the most likely speaker model is identified as the speaker of the test utterance.
-
Citations
20 Claims
-
1. A method of automatically identifying a speaker;
- the method including;
identifying a speaker by; receiving a test utterance from the speaker; determining a most likely one of a plurality of speaker models for the test utterance; and identifying the speaker associated with the most likely speaker model as the speaker of the test utterance;
wherein the method includes generating the plurality of speaker models in the background by;receiving training utterances from a pool of utterances from the plurality of speakers in the background, without prior knowledge of the speakers who spoke the respective training utterances; blind clustering of the training utterances from the pool of utterances based on a predetermined criterion, wherein the blind clustering includes calculating a likelihood vector for each training utterance; and non-explicitly training for each of the clusters a corresponding speaker model, each of the models representing a speaker. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- the method including;
-
11. A system for automatically identifying a speaker;
- the system includes;
a speaker identifier operative to identify a speaker by; receiving a test utterance from the speaker; determining a most likely one of a plurality of speaker models for the test utterance; and identifying the speaker associated with the most likely speaker model as the speaker of the test utterance; and a speaker model generator operative to generate the plurality of speaker models, wherein the speaker model generator is operative to generate the plurality of speaker models in the background by; receiving training utterances from a pool of utterances from the plurality of speakers in the background, without prior knowledge of the speakers who spoke the respective training utterances; blind clustering of the training utterances from the pool of utterances based on a predetermined criterion, wherein the blind clustering includes calculating a likelihood vector for each training utterance; and non-explicitly training for each of the clusters a corresponding speaker model, each of the models representing a speaker.
- the system includes;
-
12. A method for automatically identifying based on speech, comprising:
-
receiving training utterances from a pool of utterances from the plurality of speakers; performing blind clustering of the received training utterances from the pool of utterances based on a predetermined criterion to produce clusters, wherein the blind clustering includes calculating a likelihood vector for each training utterance; non-explicitly training, for said clusters, corresponding speaker models representing associated speakers; receiving a test utterance from a speaker; determining a most likely one of said speaker models for the test utterance; and identifying a speaker associated with the determined most likely speaker model as the speaker of the test utterance, wherein said receiving training utterances occurs in the background, without prior knowledge of the speakers who respectively spoke said training utterances, and said speaker models are generated in the background. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification