Fast, language-independent method for user authentication by voice
First Claim
Patent Images
1. A method of speech-based user authentication, comprising:
- at a device having one or more processors and memory;
receiving a plurality of different spoken utterances from a user;
for each of the plurality of different spoken utterances;
generating a respective phoneme-independent representation from the spoken utterance, the respective phoneme-independent representation including a respective spectral signature for each of a plurality of frames sampled from the spoken utterance; and
decomposing the respective phoneme-independent representation to obtain a respective content-independent recognition unit for the user;
calculating a content-independent recognition distribution value for the user based on the respective content-independent recognition units generated from the plurality of different spoken utterances; and
providing the content-independent recognition distribution value for use in a speaker authentication process.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker-specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are compared to the speaker-specific distribution values. If the speaker-specific characteristic units are within a threshold limit of the speaker-specific distribution values, the speech signal is authenticated.
217 Citations
25 Claims
-
1. A method of speech-based user authentication, comprising:
at a device having one or more processors and memory; receiving a plurality of different spoken utterances from a user; for each of the plurality of different spoken utterances; generating a respective phoneme-independent representation from the spoken utterance, the respective phoneme-independent representation including a respective spectral signature for each of a plurality of frames sampled from the spoken utterance; and decomposing the respective phoneme-independent representation to obtain a respective content-independent recognition unit for the user; calculating a content-independent recognition distribution value for the user based on the respective content-independent recognition units generated from the plurality of different spoken utterances; and providing the content-independent recognition distribution value for use in a speaker authentication process. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A method of authenticating a speech signal comprising:
at a device having one or more processors and memory; receiving a spoken utterance of an unauthenticated speaker; generating a first phoneme-independent representation based on the spoken utterance; decomposing the first phoneme-independent representation into at least one content-independent characteristic unit; comparing the at least one content-independent characteristic unit to at least one content-independent recognition distribution value, the at least one content-independent recognition distribution value previously trained by a registered speaker and generated by decomposing a second phoneme-independent representation that is associated with the registered speaker to obtain at least one content-independent recognition unit; and authenticating the spoken utterance if the at least one content-independent characteristic unit is within a threshold limit of the at least one content-independent recognition distribution value. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
16. A non-transitory computer-readable storage medium comprising instructions for causing one or more processor to:
-
receive a plurality of different spoken utterances from a user; for each of the plurality of different spoken utterances; generate a respective phoneme-independent representation from the spoken utterance, the respective phoneme-independent representation including a respective spectral signature for each of a plurality of frames sampled from the spoken utterance; and decompose the respective phoneme-independent representation to obtain a respective content-independent recognition unit for the user; calculate a content-independent recognition distribution value for the user based on the respective content-independent recognition units generated from the plurality of different spoken utterances; and provide the content-independent recognition distribution value for use in a speaker authentication process.
-
-
17. A non-transitory computer-readable storage medium comprising instructions for causing one or more processor to:
-
receive a spoken utterance of an unauthenticated speaker; generate a first phoneme-independent representation based on the spoken utterance; decompose the first phoneme-independent representation into at least one content-independent characteristic unit; compare the at least one content-independent characteristic unit to at least one content-independent recognition distribution value, the at least one content-independent recognition distribution value previously trained by a registered speaker and generated by decomposing a second phoneme-independent representation associated with the registered speaker to obtain at least one content-independent recognition unit; and authenticate the spoken utterance if the at least one content-independent characteristic unit is within a threshold limit of the at least one content-independent recognition distribution value.
-
-
18. A system for speech-based user authentication, comprising:
-
one or more processors; memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for; receiving a plurality of different spoken utterances from a user; for each of the plurality of different spoken utterances; generating a respective phoneme-independent representation from the spoken utterance, the respective phoneme-independent representation including a respective spectral signature for each of a plurality of frames sampled from the spoken utterance; and decomposing the respective phoneme-independent representation to obtain a respective content-independent recognition unit for the user; calculating a content-independent recognition distribution value for the user based on the respective content-independent recognition units generated from the plurality of different spoken utterances; and providing the content-independent recognition distribution value for use in a speaker authentication process. - View Dependent Claims (19, 20, 21)
-
-
22. A system for authenticating a speech signal, comprising:
-
one or more processors; memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for; receiving a spoken utterance of an unauthenticated speaker; generating a first phoneme-independent representation based on the spoken utterance; decomposing the first phoneme-independent representation into at least one content-independent characteristic unit; comparing the at least one content-independent characteristic unit to at least one content-independent recognition distribution value, the at least one content-independent recognition distribution value previously trained by a registered speaker and generated by decomposing a second phoneme-independent representation associated with the registered speaker to obtain at least one content-independent recognition unit; and authenticating the spoken utterance if the at least one content-independent characteristic unit is within a threshold limit of the at least one content-independent recognition distribution value. - View Dependent Claims (23, 24, 25)
-
Specification