Fast, language-independent method for user authentication by voice
First Claim
Patent Images
1. A method for speaker identification, comprising:
- at a device having one or more processors and memory;
receiving a plurality of different spoken utterances from a user;
for each of the plurality of different spoken utterances;
generating a respective phoneme-independent representation from the spoken utterance, the respective phoneme-independent representation including a respective spectral signature for each of a plurality of frames sampled from the spoken utterance; and
decomposing the respective phoneme-independent representation to obtain a respective content-independent recognition unit for the user;
calculating a content-independent recognition distribution value for the user based on the respective content-independent recognition units generated from the plurality of different spoken utterances; and
providing the content-independent recognition distribution value for use in a speaker identification process.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker-specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are compared to the speaker-specific distribution values. If the speaker-specific characteristic units are within a threshold limit of the speaker-specific distribution values, the speech signal is authenticated.
3986 Citations
42 Claims
-
1. A method for speaker identification, comprising:
at a device having one or more processors and memory; receiving a plurality of different spoken utterances from a user; for each of the plurality of different spoken utterances; generating a respective phoneme-independent representation from the spoken utterance, the respective phoneme-independent representation including a respective spectral signature for each of a plurality of frames sampled from the spoken utterance; and decomposing the respective phoneme-independent representation to obtain a respective content-independent recognition unit for the user; calculating a content-independent recognition distribution value for the user based on the respective content-independent recognition units generated from the plurality of different spoken utterances; and providing the content-independent recognition distribution value for use in a speaker identification process. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A method for speaker identification, comprising:
at a device having one or more processors and memory; receiving a spoken utterance; generating a first phoneme-independent representation based on the spoken utterance; decomposing the first phoneme-independent representation into at least one content-independent characteristic unit; comparing the at least one content-independent characteristic unit to at least one content-independent recognition distribution value associated with a registered user of the device, the at least one content-independent recognition distribution value previously generated by; generating a second phoneme-independent representation based on speech from the registered user; and decomposing the second phoneme-independent representation into a content-independent recognition unit, the at least one content-independent recognition distribution value based on the content-independent recognition unit; and determining that the spoken utterance is spoken by the registered user if the at least one content-independent characteristic unit is within a threshold limit of the at least one content-independent recognition distribution value. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A non-transitory computer-readable storage medium comprising instructions for causing one or more processor to:
-
receive a plurality of different spoken utterances from a user; for each of the plurality of different spoken utterances; generate a respective phoneme-independent representation from the spoken utterance, the respective phoneme-independent representation including a respective spectral signature for each of a plurality of frames sampled from the spoken utterance; and decompose the respective phoneme-independent representation to obtain a respective content-independent recognition unit for the user; calculate a content-independent recognition distribution value for the user based on the respective content-independent recognition units generated from the plurality of different spoken utterances; and provide the content-independent recognition distribution value for use in a speaker identification process. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A non-transitory computer-readable storage medium comprising instructions for causing one or more processor to:
-
receive a spoken utterance; generate a first phoneme-independent representation based on the spoken utterance; decompose the first phoneme-independent representation into at least one content-independent characteristic unit; compare the at least one content-independent characteristic unit to at least one-content-independent recognition distribution value associated with a registered user of a device, the at least one content-independent recognition distribution value previously generated by; generate a second phoneme-independent representation based on speech from the registered user; and decompose the second phoneme-independent representation into a content-independent recognition unit, the at least one content-independent recognition distribution value based on the content-independent recognition unit; and determine that the spoken utterance is spoken by the registered user if the at least one content-independent characteristic unit is within a threshold limit of the at least one content-independent recognition distribution value. - View Dependent Claims (23, 24, 25, 26, 27, 28)
-
-
29. A system for speaker identification, comprising:
-
one or more processors; memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for; receiving a plurality of different spoken utterances from a user; for each of the plurality of different spoken utterances; generating a respective phoneme-independent representation from the spoken utterance, the respective phoneme-independent representation including a respective spectral signature for each of a plurality of frames sampled from the spoken utterance; and decomposing the respective phoneme-independent representation to obtain a respective content-independent recognition unit for the user; calculating a content-independent recognition distribution value for the user based on the respective content-independent recognition units generated from the plurality of different spoken utterances; and providing the content-independent recognition distribution value for use in a speaker identification process. - View Dependent Claims (30, 31, 32, 33, 34, 35)
-
-
36. A system for speaker identification, comprising:
-
one or more processors; memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for; receiving a spoken utterance; generating a first phoneme-independent representation based on the spoken utterance; decomposing the first phoneme-independent representation into at least one content-independent characteristic unit; comparing the at least one content-independent characteristic unit to at least one content-independent recognition distribution value associated with a registered user of the device, the at least one content-independent recognition distribution value previously generated by; generating a second phoneme-independent representation based on speech from the registered user; and decomposing the second phoneme-independent representation into a content-independent recognition unit, the at least one content-independent recognition distribution value based on the content-independent recognition unit; and determining that the spoken utterance is spoken by the registered user if the at least one content-independent characteristic unit is within a threshold limit of the at least one content-independent recognition distribution value. - View Dependent Claims (37, 38, 39, 40, 41, 42)
-
Specification