User authentication for voice-input devices

US 9,865,268 B1
Filed: 03/14/2016
Issued: 01/09/2018
Est. Priority Date: 09/21/2012
Status: Active Grant

First Claim

Patent Images

1. A method comprising;

determining, based at least in part on a first audio signal, a request from a user;

causing, based at least in part on the request, output of a first question that is associated with a first predefined response;

determining, based at least in part on a second audio signal, that a first utterance of the user corresponds to the first predefined response based at least in part on a contextual representation associated with the first utterance;

causing, based at least in part on the first utterance, output of a second question that is associated with a second predefined response;

determining, based at least in part on a third audio signal, that a second utterance of the user corresponds to the second predefined response; and

causing, based at least in part on the second utterance, audible output of data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for authenticating users at devices that interact with the users via voice input. For instance, the described techniques may allow a voice-input device to safely verify the identity of a user by engaging in a back-and-forth conversation. The device or another device coupled thereto may then verify the accuracy of the responses from the user during the conversation, as well as compare an audio signature associated with the user'"'"'s responses to a pre-stored audio signature associated with the user. By utilizing multiple checks, the described techniques are able to accurately and safely authenticate the user based solely on an audible conversation between the user and the voice-input device.

17 Citations

View as Search Results

20 Claims

1. A method comprising;
- determining, based at least in part on a first audio signal, a request from a user;
  
  causing, based at least in part on the request, output of a first question that is associated with a first predefined response;
  
  determining, based at least in part on a second audio signal, that a first utterance of the user corresponds to the first predefined response based at least in part on a contextual representation associated with the first utterance;
  
  causing, based at least in part on the first utterance, output of a second question that is associated with a second predefined response;
  
  determining, based at least in part on a third audio signal, that a second utterance of the user corresponds to the second predefined response; and
  
  causing, based at least in part on the second utterance, audible output of data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method as recited in claim 1, further comprising initiating, based at least in part on the second utterance, the request.
  - 3. The method as recited in claim 2, wherein initiating the request comprises facilitating a transaction between the user and an entity.
  - 4. The method as recited in claim 1, wherein the first question is associated with the first predefined response and a third predefined response, and wherein determining that the first utterance of the user corresponds to the first predefined response comprises determining that the first utterance does not correspond to the third predefined response.
  - 5. The method as recited in claim 1, further comprising:
    - determining a first audio signature associated with the first utterance;
      
      determining a second audio signature associated with the first predefined response;
      
      determining a similarity score associated with the first audio signature and the second audio signature; and
      
      determining that the similarity score is equal to or exceeds a pre-determined threshold.
  - 6. The method as recited in claim 5, wherein the first audio signature associated with the first utterance is based at least in part on at least one of a pitch, a decibel level, or a tone associated with the first utterance.
  - 7. The method as recited in claim 1, further comprising:
    - causing, based at least in part on the second utterance, output of a third question that is associated with a third predefined response and a fourth predefined response;
      
      determining, based at least in part on a fourth audio signal, that a third utterance of the user does not correspond to the third predefined response or the fourth predefined response; and
      
      at least one of;
      
      causing, based at least in part on the third utterance, output of a second request for the user to output an additional utterance;
      
      orterminating a session associated with the request.
  - 8. The method as recited in claim 1, wherein:
    - at least one of the first question, the second question, or the data are at least one of audibly output by one or more speakers of a first device or visually displayed via an interface of the first device; and
      
      at least one of;
      
      a first determination that the first utterance of the user corresponds to the first predetermined response is performed by at least one of the first device or a server device located remotely from the first device;
      
      ora second determination that the second utterance of the user corresponds to the second predetermined response is performed by at least one of the first device or the server device.
  - 9. The method as recited in claim 1, wherein the contextual representation of the first utterance corresponds to at least one of a pitch, a decibel level, or a tone associated with the first utterance.

10. A system comprising:
- one or more microphones;
  
  one or more speakers;
  
  one or more processors; and
  
  memory storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising;
  
  determining, based at least in part on a first audio signal generated by the one or more microphones, a request;
  
  outputting, by the one or more speakers and based at least in part on the request, a first question that is associated with a first predefined response;
  
  determining, based at least in part on a second audio signal generated by the one or more microphones, that a first utterance corresponds to the first predefined response;
  
  outputting, by the one or more speakers and based at least in part on the first utterance, a second question that is associated with a second predefined response;
  
  determining, based at least in part on a third audio signal generated by the one or more microphones, that a second utterance corresponds to the second predefined response based at least in part on a contextual representation associated with the second utterance; and
  
  outputting, by the one or more speakers and based at least in part on the second utterance, audio data associated with the request.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The system as recited in claim 10, wherein determining that the first utterance corresponds to the first predefined response is based at least in part on at least one of a pitch, a decibel level, or a tone associated with the second utterance.
  - 12. The system as recited in claim 10, wherein the operations further comprise initiating the request by facilitating a transaction associated with the request.
  - 13. The system as recited in claim 10, wherein determining that the first utterance corresponds to the first predefined response comprises:
    - sending at least one of the second audio signal or the first utterance to a server device located remotely from the device; and
      
      receiving, from the server device, an indication that the first utterance corresponds to the first predefined response.
  - 14. The system as recited in claim 10, wherein the operations further comprise:
    - determining a first audio signature associated with the first utterance;
      
      determining a second audio signature associated with the first predefined response;
      
      determining a similarity score between the first audio signature and the second audio signature; and
      
      determining that the similarity score is equal to or exceeds a pre-determined threshold.
  - 15. The system as recited in claim 10, wherein the operations further comprise:
    - outputting, by the one or more speakers and based at least in part on the second utterance, a third question that is associated with a third predefined response and a fourth predefined response;
      
      determining, based at least in part on a fourth audio signal generated by the one or more microphones, that a third utterance does not correspond to the third predefined response or the fourth predefined response; and
      
      at least one of;
      
      outputting, by the one or more speakers and based at least in part on the third utterance, a second request to output an additional utterance;
      
      orterminating a session associated with the request.
  - 16. The system as recited in claim 10, wherein the first question is associated with the first predefined response and a third predefined response, and wherein determining that the first utterance corresponds to the first predefined response comprises determining that the first utterance does not correspond to the third predefined response.

17. A system comprising:
- one or more microphones;
  
  one or more speakers;
  
  one or more processors; and
  
  memory storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising;
  
  determining, based at least in part on a first audio signal generated by the one or more microphones, a request;
  
  outputting, by the one or more speakers and based at least in part on the request, a first question that is associated with a first predefined response and a second predefined response;
  
  determining, based at least in part on a second audio signal generated by the one or more microphones, that a first utterance corresponds to at least one of the first predefined response or the second predefined response, wherein determining that the first utterance corresponds to the at least one of the first predefined response or the second predefined response comprises determining that the first utterance does not correspond to at least one of the first predefined response or the second predefined response;
  
  outputting, by the one or more speakers and based at least in part on the first utterance, a second question that is associated with a third predefined response and a fourth predefined response;
  
  determining, based at least in part on a third audio signal generated by the one or more microphones, that a second utterance corresponds to at least one of the third predefined response or the fourth predefined response; and
  
  outputting, by the one or more speakers and based at least in part on the second utterance, audio data associated with the request.
- View Dependent Claims (18, 19, 20)
- - 18. The system as recited in claim 17, wherein the operations further comprise determining an audio signature associated with the first utterance, the audio signature being based at least in part on at least one of a pitch, a decibel level, or a tone associated with the first utterance.
  - 19. The system as recited in claim 17, wherein the operations further comprise:
    - determining that the request relates to a transaction for a product or a service; and
      
      facilitating the transaction by sending instructions to a server device located remotely from the system.
  - 20. The system as recited in claim 17, wherein the operations further comprise:
    - outputting, by the one or more speakers and based at least in part on the second utterance, a third question that is associated with a fifth predefined response and a sixth predefined response;
      
      determining, based at least in part on a fourth audio signal generated by the one or more microphones, that a third utterance corresponds to at least one of the fifth predefined response or the sixth predefined response; and
      
      facilitating, based at least in part on the third utterance, fulfillment of the request.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Narayanan, Preethi
Primary Examiner(s)
Pullias, Jesse

Application Number

US15/068,967
Time in Patent Office

666 Days
Field of Search

704246-250, 704275
US Class Current
CPC Class Codes

G06F 21/32   using biometric data, e.g. ...

G10L 15/22   Procedures used during a sp...

G10L 17/06   Decision making techniques;...

G10L 17/22   Interactive procedures; Man...

G10L 17/24   the user being prompted to ...

User authentication for voice-input devices

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

17 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

User authentication for voice-input devices

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

17 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links