Fast, language-independent method for user authentication by voice
First Claim
Patent Images
1. A method of speech-based user authentication, comprising:
- at a device comprising one or more processors and memory;
receiving a spoken utterance of a speaker;
generating a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a plurality of phoneme-independent feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sampling frequency;
decomposing the phoneme-independent matrix into multiple sets of vectors including at least a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequence;
computing at least one speaker-specific distribution value based on at least the speaker-specific recognition unit; and
authenticating an input speech signal based on the at least one speaker-specific distribution value.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker-specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are compared to the speaker-specific distribution values. If the speaker-specific characteristic units are within a threshold limit of the speaker-specific distribution values, the speech signal is authenticated.
836 Citations
41 Claims
-
1. A method of speech-based user authentication, comprising:
at a device comprising one or more processors and memory; receiving a spoken utterance of a speaker; generating a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a plurality of phoneme-independent feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sampling frequency; decomposing the phoneme-independent matrix into multiple sets of vectors including at least a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequence; computing at least one speaker-specific distribution value based on at least the speaker-specific recognition unit; and authenticating an input speech signal based on the at least one speaker-specific distribution value. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A method of authenticating a speech signal comprising:
at a device comprising one or more processors and memory; receiving a spoken utterance of an unauthenticated speaker; generating a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a first plurality of phoneme-independent spectral feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sampling frequency; decomposing the phoneme-independent matrix into a speaker-specific characteristic unit; comparing the at least one speaker-specific characteristic unit to at least one speaker-specific distribution value, the at least one speaker-specific distribution value previously trained by a registered speaker and generated by decomposing a second plurality of phoneme-independent feature vectors into sets of vectors including at least a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequence; and authenticating the spoken utterance if the at least one speaker-specific characteristic unit is within a threshold limit of the at least one speaker-specific distribution value. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
16. A system for speech-based user authentication, comprising:
-
means for receiving a spoken utterance of a speaker; means for generating a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a plurality of phoneme-independent feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sample frequency; means for decomposing the phoneme-independent matrix into multiple sets of vectors including at least a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequence; means for computing at least one speaker-specific distribution value based on at least the speaker-specific recognition unit; and means for authenticating an input speech signal based on the at least one speaker-specific distribution value.
-
-
17. A system for authenticating a speech signal comprising:
-
means for receiving a spoken utterance of a speaker; means for generating a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a first plurality of phoneme-independent spectral feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sampling frequency; means for decomposing the phoneme-independent matrix into a speaker-specific characteristic unit; means for comparing the at least one speaker-specific characteristic unit to at least one speaker-specific distribution value, the at least one speaker-specific distribution value previously trained by a registered speaker and generated by decomposing a second plurality of phoneme-independent feature vectors into sets of vectors including at least a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequence; and means for authenticating the spoken utterance if the at least one speaker-specific characteristic unit is within a threshold limit of the at least one speaker-specific distribution value. - View Dependent Claims (18, 19)
-
-
20. A non-transitory computer readable medium comprising instructions, which when executed on a processor, perform a method of speech-based user authentication, comprising:
-
receiving a spoken utterance of a speaker; generating a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a plurality of phoneme-independent feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sampling frequency; decomposing the phoneme-independent matrix into multiple sets of vectors including at least a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequence computing at least one speaker-specific distribution value based on at least the speaker-specific recognition unit; and authenticating an input speech signal based on the at least one speaker-specific distribution value.
-
-
21. A non-transitory computer readable medium comprising instructions, which when executed on a processor, perform a method for authenticating a speech signal, comprising:
-
receiving a spoken utterance of a speaker; generating a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a first plurality of phoneme-independent spectral feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sampling frequency; decomposing the phoneme-independent matrix into a speaker-specific characteristic unit; comparing the at least one speaker-specific characteristic unit to at least one speaker-specific distribution value, the at least one speaker-specific distribution value previously trained by a registered speaker and generated by decomposing a second plurality of phoneme-independent feature vectors into sets of vectors, including at least a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequence; and authenticating the spoken utterance if the at least one speaker-specific characteristic unit is within a threshold limit of the at least one speaker-specific distribution value. - View Dependent Claims (22, 23)
-
-
24. A system for speech-based user authentication, comprising:
a processor configured to receive a spoken utterance of a speaker, generate a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a plurality of phoneme-independent feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sampling frequency, decompose the phoneme-independent matrix into multiple sets of vectors at least a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequence, compute at least one speaker-specific distribution value based on at least the speaker-specific recognition unit; and authenticate an input speech signal based on the at least one speaker-specific distribution value. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31)
-
32. A system for authenticating a speech signal comprising:
a processor to receive a spoken utterance of an unauthenticated speaker, generate a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a first plurality of phoneme-independent spectral feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sampling frequency, decompose the phoneme-independent matrix into a speaker-specific characteristic unit, compare the at least one speaker-specific characteristic unit to at least one speaker-specific distribution value, the at least one speaker-specific distribution value previously trained by a registered speaker, and authenticate the spoken utterance if the at least one speaker-specific characteristic unit is within a threshold limit of the at least one speaker-specific distribution value, wherein the at least one speaker-specific distribution value is generated by decomposing a second plurality of phoneme-independent feature vectors into sets of vectors including a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequence. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41)
Specification