Method and system for on-line unsupervised adaptation in speaker verification
First Claim
Patent Images
1. A method comprising:
- verifying the identity of a speaker during a speaker verification session based on a speaker model, including generating a confidence score representing a degree of confidence that the speaker is who the speaker claims to be;
determining whether a communication channel user by the speaker during the speaker verification session matches a base channel that was previously used by the speaker to enroll the speaker for speaker verification;
if the communication channel matches the base channel, then automatically updating the speaker model for use during subsequent speaker verification, based on vocal characteristics of the speaker on the communication channel; and
if the communication channel does not match the base channel, then automatically updating the speaker model for use during subsequent speaker verification by transforming the speaker model between channels, based on vocal characteristics of the speaker on the communication channel;
wherein said automatically updating the speaker model comprises updating the speaker model by a degree of aggressiveness that is based on the confidence score.
5 Assignments
0 Petitions
Accused Products
Abstract
The present invention introduces a system and method for unsupervised, on-line, adaptation in speaker verification. In one embodiment, a method for adapting a speaker model to improve the verification of a speaker'"'"'s voice, comprises detecting a channel of a verification utterance; learning vocal characteristics of the speaker on the detected channel; and transforming the learned vocal characteristics of the speaker from the detected channel to the speaker model of a second channel.
-
Citations
22 Claims
-
1. A method comprising:
-
verifying the identity of a speaker during a speaker verification session based on a speaker model, including generating a confidence score representing a degree of confidence that the speaker is who the speaker claims to be;
determining whether a communication channel user by the speaker during the speaker verification session matches a base channel that was previously used by the speaker to enroll the speaker for speaker verification;
if the communication channel matches the base channel, then automatically updating the speaker model for use during subsequent speaker verification, based on vocal characteristics of the speaker on the communication channel; and
if the communication channel does not match the base channel, then automatically updating the speaker model for use during subsequent speaker verification by transforming the speaker model between channels, based on vocal characteristics of the speaker on the communication channel;
wherein said automatically updating the speaker model comprises updating the speaker model by a degree of aggressiveness that is based on the confidence score. - View Dependent Claims (2, 3, 4, 5, 6)
detecting the vocal characteristics of the speaker on the communication channel during the speaker verification session;
wherein verifying the identity of a sneaker comprises verifying the identity of the speaker by using the speaker model and the vocal characteristics; and
wherein automatically updating the speaker model is performed only after the identity of the speaker is verified.
-
-
3. A method as recited in claim 1, wherein said automatically updating the speaker model for use during subsequent speaker verification by using transformation between channels comprises:
-
transforming the speaker model from the base channel to correspond to the communication channel;
adapting the transformed speaker model based on the vocal characteristics of the speaker on the communication channel, and inverse transforming the adapted transformed speaker model to correspond to the base channel.
-
-
4. A method as recited in claim 1, further comprising detecting the gender of the speaker, wherein the speaker model is gender-specific.
-
5. A method as recited in claim 1, wherein the channel is one of a plurality of channels usable by the speaker for verification, each corresponding to a different type of communication device.
-
6. A method as recited in claim 1, wherein the size of the speaker model is not increased as a result of the speaker model being updated.
-
7. A method of performing unsupervised adaptation of a speaker model for use in speaker verification, the method comprising:
-
detecting a communication channel by which an utterance of a speaker is received for speaker verification;
learning vocal characteristics of the speaker on the detected channel;
verifying the identity of the speaker by using a speaker model associated with the speaker;
determining whether the detected communication channel matches a base channel previously used to enroll the speaker for verification;
if the detected communication channel matches the base channel, then updating the speaker model for subsequent use in speaker verification, based on the learned vocal characteristics of the speaker; and
if the detected communication channel does not match the base channel, then updating the speaker model for subsequent use in unsupervised speaker verification, by transforming the speaker model to correspond to the detected channel, adapting the transformed speaker model based on the learned vocal characteristics of the speaker, and inverse transforming the adapted transformed speaker model to correspond to the base channel. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
wherein said updating the speaker model comprises updating the speaker model by a degree of aggressiveness that is based on the confidence score.
-
-
11. A method as recited in claim 7, further comprising detecting the gender of the caller, wherein the speaker model is gender-specific.
-
12. A method as recited in claim 7, wherein the channel is one of a plurality of channels usable by the speaker for verification, each corresponding to a different type of communication device.
-
13. A method as recited in claim 7, wherein said updating the speaker model comprises:
-
updating a first speaker model based on vocal characteristics associated with a second speaker model; and
discarding the second speaker model.
-
-
14. A method as recited in claim 7, wherein the size of the speaker model is not increased as a result of the speaker model being updated.
-
15. A processing system comprising:
-
a processor; and
a storage facility coupled to the processor and storing instructions which, when executed by the processor, cause the processing system to perform a process including;
verifying the identity of a speaker during a speaker verification session based on a speaker model, including generating a confidence score representing a degree of confidence that the speaker is who the speaker claims to be;
determining whether a communication channel user by the speaker during the speaker verification session matches a base channel that was previously used by the speaker to enroll the speaker for speaker verification;
if the communication channel matches the base channel, then automatically updating the speaker model for use during subsequent speaker verification, based on vocal characteristics of the speaker on the communication channel; and
if the communication channel does not match the base channel, then automatically updating the speaker model for use during subsequent speaker verification by using transformation between channels, based on vocal characteristics of the speaker on the communication channel;
wherein said automatically updating the speaker model comprises updating the speaker model by a degree of aggressiveness that is based on the confidence score. - View Dependent Claims (16, 17, 18, 19, 20)
detecting the vocal characteristics of the speaker on the communication channel during the speaker verification session;
wherein verifying the identity of a speaker comprises verifying the identity of the speaker by using the speaker model and the vocal characteristics; and
wherein automatically updating the speaker model is performed only after the identity of the speaker is verified.
-
-
17. A processing system as recited in claim 15, wherein said automatically updating the speaker model for use during subsequent speaker verification by using transformation between channels comprises:
-
transforming the speaker model from the base channel to correspond to the communication channel;
adapting the transformed speaker model based on the vocal characteristics of the speaker on the communication channel, and inverse transforming the adapted transformed speaker model to correspond to the base channel.
-
-
18. A processing system as recited in claim 15, further comprising detecting the gender of the speaker, wherein the speaker model is gender-specific.
-
19. A processing system as recited in claim 15, wherein the channel is one of a plurality of channels usable by the speaker for verification, each corresponding to a different type of communication device.
-
20. A processing system as recited in claim 15, wherein the size of the speaker model is not increased as a result of the speaker model being updated.
-
21. A speaker verification system comprising:
-
an automatic speech recognizer to recognize speech of a speaker received on a detected channel during a first unsupervised speaker verification session;
an automatic speaker verifier to verify the identity of the speaker during the first unsupervised speaker verification session by using a speaker model associated with the speaker, wherein the speaker model corresponds to a base channel previously used by the speaker to enroll the speaker for speaker verification, and the detected channel is a channel other than the base channel; and
an automatic adapter to update the speaker model for use during a subsequent unsupervised speaker verification session, based on vocal characteristics of the speaker on the detected channel, by automatically;
transforming the speaker model from the base channel to correspond to the detected channel;
updating the speaker model based on characteristics of the utterance on the detected channel; and
inverse transforming the speaker model to correspond to the detected channel.
-
-
22. An apparatus comprising:
-
means for verifying the identity of a speaker during a speaker verification session based on a speaker model, including generating a confidence score representing a degree of confidence that the speaker is who the speaker claims to be;
means for determining whether a communication channel user by the speaker during the speaker verification session matches a base channel that was previously used by the speaker to enroll the speaker for speaker verification;
means for automatically updating the speaker model for use during subsequent speaker verification by a degree of aggressiveness that is based on the confidence score, based on vocal characteristics of the speaker on the communication channel, if the communication channel matches the base channel; and
means for automatically updating the speaker model for use during subsequent speaker verification by a degree of aggressiveness that is based on the confidence score by using transformation between channels, based on vocal characteristics of the speaker on the communication channel, if the communication channel does not match the base channel.
-
Specification