Combined dual spectral and temporal alignment method for user authentication by voice

US 6,697,779 B1
Filed: 09/29/2000
Issued: 02/24/2004
Est. Priority Date: 09/29/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A method of training a user authentication by voice signal, the user authentication based on measuring diagonality deviations, the method comprising:

globally decomposing a set of a plurality of feature vectors into at least one speaker-specific decomposition unit; and

computing a speaker-specific recognition unit from the at least one speaker-specific decomposition unit for subsequent derivation of the diagonality deviations.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for training a user authentication by voice signal are described. In one embodiment, during training, a set of all spectral feature vectors for a given speaker is globally decomposed into speaker-specific decomposition units and a speaker-specific recognition unit. During recognition, spectral feature vectors are locally decomposed into speaker-specific characteristic units. The speaker-specific recognition unit is used together with selected speaker-specific characteristic units to compute a speaker-specific comparison unit. If the speaker-specific comparison unit is within a threshold limit, then the voice signal is authenticated. In addition, a speaker-specific content unit is time-aligned with selected speaker-specific characteristic units. If the alignment is within a threshold limit, then the voice signal is authenticated. In one embodiment, if both thresholds are satisfied, then the user is authenticated.

58 Citations

View as Search Results

42 Claims

1. A method of training a user authentication by voice signal, the user authentication based on measuring diagonality deviations, the method comprising:
- globally decomposing a set of a plurality of feature vectors into at least one speaker-specific decomposition unit; and
  
  computing a speaker-specific recognition unit from the at least one speaker-specific decomposition unit for subsequent derivation of the diagonality deviations.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 16)
- - 2. The method of claim 1 further comprising:
3. The method of claim 2 wherein globally decomposing further comprises:
- applying a global singular value decomposition to the at least one speaker-specific feature extraction representation.
4. The method of claim 1 further comprising:
- generating the speaker-specific recognition unit from a singular value matrix of a global singular value decomposition of the set of a plurality of feature vectors.
5. The method of claim 1 further comprising:
- locally decomposing at least one spectral feature vector into at least one speaker-specific characteristic unit;
  
  computing a speaker-specific comparison unit from the at least one speaker-specific characteristic unit and the speaker-specific recognition unit; and
  
  authenticating the user if a measurement of the diagonality deviation for the speaker-specific comparison unit is within a first threshold limit.
6. The method of claim 5 wherein decomposing further comprises:
- applying a local singular value decomposition to the at least one spectral feature vector.
7. The method of claim 5 further comprising:
- aligning a time axis of the at least one spectral feature vector with a time axis of a speaker-specific content unit previously trained by the user; and
  
  further authenticating the user if the time axes are aligned within a second threshold limit.
8. The method of claim 1 further comprising:
- time warping the plurality of feature vectors into a speaker-specific content unit.
16. The method of claim 7 further comprising:
- aligning a time axis of the at least one spectral feature vector with a time axis of a speaker-specific content unit, the speaker-specific content unit previously trained by the user; and
  
  further authenticating the user if the time axes are aligned within a second threshold limit.

9. A method of authenticating a voice signal comprising:
- locally decomposing at least one spectral feature vector into at least one speaker-specific characteristic unit;
  
  computing a speaker-specific comparison unit from the at least one speaker-specific characteristic unit and a speaker-specific recognition unit previously trained by a user; and
  
  authenticating the user if a measurement of a diagonality deviation for the speaker-specific comparison unit is within a first threshold limit.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The method of claim 9 further comprising:
11. The method of claim 10 further comprising:
- extracting a set of the plurality of feature vectors into a speaker-specific feature extraction representation; and
  
  globally decomposing the speaker-specific feature extraction representation into the speaker-specific recognition unit.
12. The method of claim 11 wherein globally decomposing further comprises:
- applying a global singular value decomposition to the speaker-specific feature extraction representation.
13. The method of claim 10 further comprising:
- generating the speaker-specific recognition unit from a singular value matrix of a global singular value decomposition of the set of a plurality of feature vectors.
14. The method of claim 9 wherein locally decomposing further comprises:
- applying a local singular value decomposition to the at least one spectral feature vector.
15. The method of claim 9 further comprising:
- generating the at least one speaker-specific characteristic unit from a singular value matrix of a local singular value decomposition of the at least one spectral feature vector.

17. A system for training a user authentication by voice signal, the user authentication based on measuring diagonality deviations, the system comprising:
- a processor configured to globally decompose a set of a plurality of feature vectors into at least one speaker-specific decomposition unit, and select a speaker-specific recognition unit from the at least one speaker-specific decomposition unit for subsequent derivation of the diagonality deviations.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25)
- - 18. The system of claim 17, wherein the processor is further configured to align a time axis of the at least one spectral feature vector with a time axis of a speaker-specific content unit previously trained by the user, and authenticate the user if the time axes are aligned within a second threshold limit.
  - 19. The system of claim 17 further comprising:
20. The system of claim 19 wherein the processor is further configured to globally decompose the at least one speaker-specific feature extraction representation into the speaker-specific recognition unit.
21. The system of claim 20 wherein the processor is further configured to apply a global singular value decomposition to the speaker-specific feature extraction representation to generate the speaker-specific recognition unit.
22. The system of claim 17 wherein the processor is further configured to generate the speaker-specific recognition unit from a singular value matrix of a global singular value decomposition of the set of a plurality of feature vectors.
23. The system of claim 18 wherein the processor is further configured to locally decompose at least one spectral feature vector into at least one speaker-specific characteristic unit, and authenticate the ′
- user if a measure of the diagonality deviation for a speaker-specific comparison unit is within a first threshold limit, the speaker-specific comparison unit having been previously computed from the at least one speaker-specific characteristic unit and the speaker-specific recognition unit.
24. The system of claim 23 wherein the processor is further configured to apply a singular value decomposition to the at least one spectral feature vector.
25. The system of claim 17, wherein the processor is further configured to time warp the plurality of feature vectors into a speaker-specific content unit.

26. A system for authenticating a voice signal comprising:
- a processor to locally decompose at least one spectral feature vector into at least one speaker-specific characteristic unit, compute a speaker-specific comparison unit from the at least one speaker-specific characteristic unit and a speaker-specific recognition unit previously trained by a user, and authenticate the user if a measurement of a diagonality deviation for the speaker-specific comparison unit is within a first threshold limit.
- View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34)
- - 27. The system of claim 26 wherein the processor is further configured to generate the at least one speaker-specific characteristic unit from a singular value matrix of a local singular value decomposition of the at least one feature vector.
  - 28. The system of claim 26 wherein the speaker-specific recognition unit is globally decomposed from a set of a plurality of feature vectors.
  - 29. The system of claim 28 further comprising:
30. The system of claim 29 wherein the processor is further configured to globally decompose the speaker-specific feature extraction representation into at least one speaker-specific decomposition unit, and select the speaker-specific recognition unit from the at least one speaker-specific decomposition unit.
31. The system of claim 30 wherein the processor is further configured to apply a global singular value decomposition to the speaker-specific extraction representation to generate the speaker-specific recognition unit.
32. The system of claim 28 wherein the processor is further configured to generate the speaker-specific recognition unit from a singular value matrix of a global singular value decomposition of the set of a plurality of feature vectors.
33. The system of claim 26 wherein the processor is further configured to apply a singular value decomposition to the at least one spectral feature vector.
34. The system of claim 26, wherein the processor is further configured to align a time axis of the at least one spectral feature vector with a time axis of a speaker-specific content unit previously trained by the user, and ′
- authenticate the user if the time axes are aligned within a second threshold limit.

35. A system for training a user authentication by voice signal, the user authentication based on measuring diagonality deviations, the system comprising:
- means for globally decomposing a set of a plurality of feature vectors into at least one speaker-specific decomposition unit; and
  
  means for computing a speaker-specific recognition unit from the at least one speaker-specific decomposition unit for subsequent derivation of the diagonality deviations.
- View Dependent Claims (36)
- - 36. The system of claim 35 further comprising:

37. A computer readable medium comprising instructions, which when executed on a processor, perform a method for training a user authentication by voice signal, the user authentication based on measuring diagonality deviations, the method comprising:
- globally decomposing a set of a plurality of feature vectors into at least one speaker-specific decomposition unit; and
  
  selecting a speaker-specific recognition unit from the at least one speaker-specific decomposition unit for subsequent derivation of the diagonality deviations.
- View Dependent Claims (38)
- - 38. The computer readable medium of claim 37, wherein the method further comprises:

39. A system for authenticating a voice signal comprising:
- means for locally decomposing at least one spectral feature vector into at least one speaker-specific characteristic unit;
  
  means for computing a speaker-specific comparison unit from the at least one speaker-specific characteristic unit and a speaker-specific recognition unit previously trained by a user; and
  
  means for authenticating the user if a measurement of a diagonality deviation for the speaker-specific comparison unit is within a first threshold limit.
- View Dependent Claims (40)
- - 40. The system of claim 39 further comprising:

41. A computer readable medium comprising instructions, which when executed on a processor, perform a method for authenticating a voice signal, comprising:
- locally decomposing at least one spectral feature vector into at least one speaker-specific characteristic unit;
  
  computing a speaker-specific comparison unit from the at least one speaker-specific characteristic unit and a speaker-specific recognition unit previously trained by a user; and
  
  authenticating the user if a measurement of a diagonality deviation for the speaker-specific comparison unit is within a first threshold limit.
- View Dependent Claims (42)
- - 42. The computer readable medium of claim 41, wherein the method further comprises:

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Computer Incorporated (Apple Inc.)
Inventors
Bellegarda, Jerome, Neeracher, Matthias, Naik, Devang, Silverman, Kim
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US09/677,385
Time in Patent Office

1,243 Days
Field of Search

704/231, 704/236, 704/239, 704/241, 704/246, 704/250, 379/88.02
US Class Current

704/246
CPC Class Codes

G10L 17/04 Training, enrolment or mode...

Combined dual spectral and temporal alignment method for user authentication by voice

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

58 Citations

42 Claims

Specification

Use Cases

Quick Links

Others

Combined dual spectral and temporal alignment method for user authentication by voice

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

58 Citations

42 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others