SYSTEM AND METHOD FOR DYNAMIC FACIAL FEATURES FOR SPEAKER RECOGNITION

US 20120281885A1
Filed: 05/05/2011
Published: 11/08/2012
Est. Priority Date: 05/05/2011
Status: Active Grant

First Claim

Patent Images

1. A method of performing speaker verification, the method comprising:

receiving a request to verify a speaker;

generating a text challenge that is unique to the request;

in response to the request, prompting the speaker to utter the text challenge;

recording a dynamic image feature of the speaker as the speaker utters the text challenge; and

performing speaker verification based on the dynamic image feature and the text challenge.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for performing speaker verification. A system configured to practice the method receives a request to verify a speaker, generates a text challenge that is unique to the request, and, in response to the request, prompts the speaker to utter the text challenge. Then the system records a dynamic image feature of the speaker as the speaker utters the text challenge, and performs speaker verification based on the dynamic image feature and the text challenge. Recording the dynamic image feature of the speaker can include recording video of the speaker while speaking the text challenge. The dynamic feature can include a movement pattern of head, lips, mouth, eyes, and/or eyebrows of the speaker. The dynamic image feature can relate to phonetic content of the speaker speaking the challenge, speech prosody, and the speaker'"'"'s facial expression responding to content of the challenge.

84 Citations

View as Search Results

20 Claims

1. A method of performing speaker verification, the method comprising:
- receiving a request to verify a speaker;
  
  generating a text challenge that is unique to the request;
  
  in response to the request, prompting the speaker to utter the text challenge;
  
  recording a dynamic image feature of the speaker as the speaker utters the text challenge; and
  
  performing speaker verification based on the dynamic image feature and the text challenge.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the recording the dynamic image feature of the speaker comprises recording video of the speaker while speaking the text challenge.
  - 3. The method of claim 1, wherein the dynamic image feature comprises a pattern of movement.
  - 4. The method of claim 3, wherein the pattern of movement is based on at least one of head, lips, mouth, eyes, and eyebrows.
  - 5. The method of claim 1, wherein the dynamic image feature relates to at least one of phonetic content of the speaker speaking the text challenge, speech prosody, a facial expression of the speaker in response to content of the text challenge, and a non-facial physically manifested response.
  - 6. The method of claim 1, wherein generating the text challenge is based on eliciting highly distinctive behavior of the speaker.
  - 7. The method of claim 1, wherein performing speaker verification is based on a database of speaker behaviors.
  - 8. The method of claim 1, wherein performing speaker verification is further based on a location of the speaker.

9. A system for identifying a user, the system comprising:
- a processor;
  
  a first module configured to control the processor to prompt the user to utter a unique text challenge;
  
  a second module configured to control the processor to record audio and video of the user while the user utters the unique text challenge;
  
  a third module configured to control the processor to perform a comparison of the audio and the video to a database of observable behavior based on the unique text challenge;
  
  a fourth module configured to control the processor to identify the user based on the comparison.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The system of claim 9, wherein the comparison further comprises ensuring that the audio and the video match.
  - 11. The system of claim 9, wherein the unique text challenge is unpredictable.
  - 12. The system of claim 9, wherein the comparison further comprises:
    - identifying features of the user in the video;
      
      analyzing the features; and
      
      temporally aligning the features to the audio based on the unique text challenge.
  - 13. The system of claim 12, wherein the features comprise at least one of a degree of a mouth opening, symmetry of the mouth opening, lip rounding, lip spreading, visible tongue position, head movement, eyebrow movement, and eye shape.
  - 14. The system of claim 12, wherein the features comprise a facial expression of the user in response to the unique text challenge.

15. A non-transitory computer-readable storage medium storing instructions for serving requests for speaker verification which, when executed by a computing device, cause the computing device to perform steps comprising:
- receiving, from a user verification device, a request to confirm a user identity;
  
  retrieving a user profile associated with the user identity;
  
  generating a unique text challenge based on the user profile;
  
  instructing the user verification device to prompt the user to utter the unique text challenge;
  
  receiving from the user verification device an audio recording and a video recording of the user uttering the unique text challenge;
  
  performing an analysis of the audio recording and the video recording based on the user profile; and
  
  sending a confirmation to the user verification device if the analysis meets a verification threshold.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The non-transitory computer-readable storage medium of claim 15, wherein the user profile is generated as part of a user enrollment process.
  - 17. The non-transitory computer-readable storage medium of claim 15, wherein the user verification device uses the confirmation as part of a multi-factor authentication of the user.
  - 18. The non-transitory computer-readable storage medium of claim 15, further comprising:
    - receiving from the user verification device an indication of desired user verification certainty; and
      
      setting the verification threshold based on the desired user verification certainty.
  - 19. The non-transitory computer-readable storage medium of claim 15, wherein performing the analysis further comprises temporally aligning the audio recording and the video recording, and determining whether the audio recording and the video recording match.
  - 20. The non-transitory computer-readable storage medium of claim 15, wherein the unique text challenge is generated according to the user profile to elicit a distinctive identifiable behavior in the user when the user utters the unique text challenge.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
SYRDAL, Ann K., Chopra, Sumit, Haffner, Patrick, Mishra, Taniya, Zeljkovic, Ilija, Zavesky, Eric

Granted Patent

US 8,897,500 B2
Time in Patent Office

Days
Field of Search
US Class Current

382/116
CPC Class Codes

G06F 21/32   using biometric data, e.g. ...

G06F 2221/2103   Challenge-response

G06V 40/166   using acquisition arrangements

G06V 40/171   Local features and componen...

G06V 40/172   Classification, e.g. identi...

G06V 40/176   Dynamic expression

G06V 40/20   Movements or behaviour, e.g...

G10L 15/25   using position of the lips,...

G10L 17/24   the user being prompted to ...

G10L 21/06   Transformation of speech in...

SYSTEM AND METHOD FOR DYNAMIC FACIAL FEATURES FOR SPEAKER RECOGNITION

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

84 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR DYNAMIC FACIAL FEATURES FOR SPEAKER RECOGNITION

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

84 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links