Combined dual spectral and temporal alignment method for user authentication by voice
First Claim
1. A method of training a user authentication by voice signal, the user authentication based on measuring diagonality deviations, the method comprising:
- globally decomposing a set of a plurality of feature vectors into at least one speaker-specific decomposition unit; and
computing a speaker-specific recognition unit from the at least one speaker-specific decomposition unit for subsequent derivation of the diagonality deviations.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for training a user authentication by voice signal are described. In one embodiment, during training, a set of all spectral feature vectors for a given speaker is globally decomposed into speaker-specific decomposition units and a speaker-specific recognition unit. During recognition, spectral feature vectors are locally decomposed into speaker-specific characteristic units. The speaker-specific recognition unit is used together with selected speaker-specific characteristic units to compute a speaker-specific comparison unit. If the speaker-specific comparison unit is within a threshold limit, then the voice signal is authenticated. In addition, a speaker-specific content unit is time-aligned with selected speaker-specific characteristic units. If the alignment is within a threshold limit, then the voice signal is authenticated. In one embodiment, if both thresholds are satisfied, then the user is authenticated.
58 Citations
42 Claims
-
1. A method of training a user authentication by voice signal, the user authentication based on measuring diagonality deviations, the method comprising:
-
globally decomposing a set of a plurality of feature vectors into at least one speaker-specific decomposition unit; and
computing a speaker-specific recognition unit from the at least one speaker-specific decomposition unit for subsequent derivation of the diagonality deviations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 16)
extracting the set of a plurality of feature vectors into at least one speaker-specific feature extraction representation; and
globally decomposing the at least one speaker-specific feature extraction representation into the speaker-specific recognition unit.
-
-
3. The method of claim 2 wherein globally decomposing further comprises:
applying a global singular value decomposition to the at least one speaker-specific feature extraction representation.
-
4. The method of claim 1 further comprising:
generating the speaker-specific recognition unit from a singular value matrix of a global singular value decomposition of the set of a plurality of feature vectors.
-
5. The method of claim 1 further comprising:
-
locally decomposing at least one spectral feature vector into at least one speaker-specific characteristic unit;
computing a speaker-specific comparison unit from the at least one speaker-specific characteristic unit and the speaker-specific recognition unit; and
authenticating the user if a measurement of the diagonality deviation for the speaker-specific comparison unit is within a first threshold limit.
-
-
6. The method of claim 5 wherein decomposing further comprises:
applying a local singular value decomposition to the at least one spectral feature vector.
-
7. The method of claim 5 further comprising:
-
aligning a time axis of the at least one spectral feature vector with a time axis of a speaker-specific content unit previously trained by the user; and
further authenticating the user if the time axes are aligned within a second threshold limit.
-
-
8. The method of claim 1 further comprising:
time warping the plurality of feature vectors into a speaker-specific content unit.
-
16. The method of claim 7 further comprising:
-
aligning a time axis of the at least one spectral feature vector with a time axis of a speaker-specific content unit, the speaker-specific content unit previously trained by the user; and
further authenticating the user if the time axes are aligned within a second threshold limit.
-
-
9. A method of authenticating a voice signal comprising:
-
locally decomposing at least one spectral feature vector into at least one speaker-specific characteristic unit;
computing a speaker-specific comparison unit from the at least one speaker-specific characteristic unit and a speaker-specific recognition unit previously trained by a user; and
authenticating the user if a measurement of a diagonality deviation for the speaker-specific comparison unit is within a first threshold limit. - View Dependent Claims (10, 11, 12, 13, 14, 15)
globally decomposing a plurality of feature vectors into a speaker-specific recognition unit.
-
-
11. The method of claim 10 further comprising:
-
extracting a set of the plurality of feature vectors into a speaker-specific feature extraction representation; and
globally decomposing the speaker-specific feature extraction representation into the speaker-specific recognition unit.
-
-
12. The method of claim 11 wherein globally decomposing further comprises:
applying a global singular value decomposition to the speaker-specific feature extraction representation.
-
13. The method of claim 10 further comprising:
generating the speaker-specific recognition unit from a singular value matrix of a global singular value decomposition of the set of a plurality of feature vectors.
-
14. The method of claim 9 wherein locally decomposing further comprises:
applying a local singular value decomposition to the at least one spectral feature vector.
-
15. The method of claim 9 further comprising:
generating the at least one speaker-specific characteristic unit from a singular value matrix of a local singular value decomposition of the at least one spectral feature vector.
-
17. A system for training a user authentication by voice signal, the user authentication based on measuring diagonality deviations, the system comprising:
-
a processor configured to globally decompose a set of a plurality of feature vectors into at least one speaker-specific decomposition unit, and select a speaker-specific recognition unit from the at least one speaker-specific decomposition unit for subsequent derivation of the diagonality deviations. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25)
a feature extractor to extract the set of a plurality of feature vectors into at least one speaker-specific feature extraction representation.
-
-
20. The system of claim 19 wherein the processor is further configured to globally decompose the at least one speaker-specific feature extraction representation into the speaker-specific recognition unit.
-
21. The system of claim 20 wherein the processor is further configured to apply a global singular value decomposition to the speaker-specific feature extraction representation to generate the speaker-specific recognition unit.
-
22. The system of claim 17 wherein the processor is further configured to generate the speaker-specific recognition unit from a singular value matrix of a global singular value decomposition of the set of a plurality of feature vectors.
-
23. The system of claim 18 wherein the processor is further configured to locally decompose at least one spectral feature vector into at least one speaker-specific characteristic unit, and authenticate the ′
- user if a measure of the diagonality deviation for a speaker-specific comparison unit is within a first threshold limit, the speaker-specific comparison unit having been previously computed from the at least one speaker-specific characteristic unit and the speaker-specific recognition unit.
-
24. The system of claim 23 wherein the processor is further configured to apply a singular value decomposition to the at least one spectral feature vector.
-
25. The system of claim 17, wherein the processor is further configured to time warp the plurality of feature vectors into a speaker-specific content unit.
-
26. A system for authenticating a voice signal comprising:
-
a processor to locally decompose at least one spectral feature vector into at least one speaker-specific characteristic unit, compute a speaker-specific comparison unit from the at least one speaker-specific characteristic unit and a speaker-specific recognition unit previously trained by a user, and authenticate the user if a measurement of a diagonality deviation for the speaker-specific comparison unit is within a first threshold limit. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34)
a feature extractor to extract the set of a plurality of feature vectors into a speaker-specific feature extraction representation.
-
-
30. The system of claim 29 wherein the processor is further configured to globally decompose the speaker-specific feature extraction representation into at least one speaker-specific decomposition unit, and select the speaker-specific recognition unit from the at least one speaker-specific decomposition unit.
-
31. The system of claim 30 wherein the processor is further configured to apply a global singular value decomposition to the speaker-specific extraction representation to generate the speaker-specific recognition unit.
-
32. The system of claim 28 wherein the processor is further configured to generate the speaker-specific recognition unit from a singular value matrix of a global singular value decomposition of the set of a plurality of feature vectors.
-
33. The system of claim 26 wherein the processor is further configured to apply a singular value decomposition to the at least one spectral feature vector.
-
34. The system of claim 26, wherein the processor is further configured to align a time axis of the at least one spectral feature vector with a time axis of a speaker-specific content unit previously trained by the user, and ′
- authenticate the user if the time axes are aligned within a second threshold limit.
-
35. A system for training a user authentication by voice signal, the user authentication based on measuring diagonality deviations, the system comprising:
-
means for globally decomposing a set of a plurality of feature vectors into at least one speaker-specific decomposition unit; and
means for computing a speaker-specific recognition unit from the at least one speaker-specific decomposition unit for subsequent derivation of the diagonality deviations. - View Dependent Claims (36)
means for time warping the plurality of feature vectors into a speaker-specific content unit.
-
-
37. A computer readable medium comprising instructions, which when executed on a processor, perform a method for training a user authentication by voice signal, the user authentication based on measuring diagonality deviations, the method comprising:
-
globally decomposing a set of a plurality of feature vectors into at least one speaker-specific decomposition unit; and
selecting a speaker-specific recognition unit from the at least one speaker-specific decomposition unit for subsequent derivation of the diagonality deviations. - View Dependent Claims (38)
time warping the plurality of feature vectors into a speaker-specific content unit.
-
-
39. A system for authenticating a voice signal comprising:
-
means for locally decomposing at least one spectral feature vector into at least one speaker-specific characteristic unit;
means for computing a speaker-specific comparison unit from the at least one speaker-specific characteristic unit and a speaker-specific recognition unit previously trained by a user; and
means for authenticating the user if a measurement of a diagonality deviation for the speaker-specific comparison unit is within a first threshold limit. - View Dependent Claims (40)
means for aligning a time axis of the at least one spectral feature vector with a time axis of a speaker-specific content unit previously trained by the user, and the means for authenticating further authenticating the user if the time axes are aligned within a second threshold limit.
-
-
41. A computer readable medium comprising instructions, which when executed on a processor, perform a method for authenticating a voice signal, comprising:
-
locally decomposing at least one spectral feature vector into at least one speaker-specific characteristic unit;
computing a speaker-specific comparison unit from the at least one speaker-specific characteristic unit and a speaker-specific recognition unit previously trained by a user; and
authenticating the user if a measurement of a diagonality deviation for the speaker-specific comparison unit is within a first threshold limit. - View Dependent Claims (42)
aligning a time axis of the at least one spectral feature vector with a time axis of a speaker-specific content unit previously trained by the user; and
further authenticating the user if the time axes are aligned within a second threshold limit.
-
Specification