User authentication for devices using voice input or audio signatures

US 9,286,899 B1
Filed: 09/21/2012
Issued: 03/15/2016
Est. Priority Date: 09/21/2012
Status: Active Grant

First Claim

Patent Images

1. An apparatus comprising;

a speaker;

a microphone to generate one or more audio signals from sound captured within an environment;

a processor; and

computer-readable media storing computer-executable instructions that, when executed on the processor, cause the processor to perform acts comprising;

identifying, based at least in part on the one or more audio signals, a request from a user to initiate a transaction;

outputting, via the speaker, a request that the user utter a password associated with the user;

determining, from the one or more audio signals, whether a first utterance of the user includes a password that matches the password associated with the user and whether an audio signature of the first utterance has a similarity score to an audio signature associated with the user that is greater than a first pre-defined threshold, the audio signature of the first utterance being based at least partly on a pitch, a decibel level, and a tone associated with the one or more audio signals;

at least partly in response to determining that the passwords match and that the similarity score of the audio signature of the first utterance to the audio signature associated with the user is greater than the first pre-defined threshold, causing output of, via the speaker, a request that the user answer a pre-stored question having a previously selected answer;

determining, from the one or more audio signals, whether a second utterance of the user includes the previously selected answer and whether an audio signature of the second utterance has a similarity score to the audio signature associated with the user that is greater than a second pre-defined threshold; and

initiating the transaction at least partly in response to determining that the second utterance includes the previously selected answer and that the similarity score of the audio signature of the second utterance to the audio signature associated with the user is greater than the second pre-defined threshold.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for authenticating users at devices that interact with the users via voice input. For instance, the described techniques may allow a voice-input device to safely verify the identity of a user by engaging in a back-and-forth conversation. The device or another device coupled thereto may then verify the accuracy of the responses from the user during the conversation, as well as compare an audio signature associated with the user'"'"'s responses to a pre-stored audio signature associated with the user. By utilizing multiple checks, the described techniques are able to accurately and safely authenticate the user based solely on an audible conversation between the user and the voice-input device.

Citations

20 Claims

1. An apparatus comprising;
- a speaker;
  
  a microphone to generate one or more audio signals from sound captured within an environment;
  
  a processor; and
  
  computer-readable media storing computer-executable instructions that, when executed on the processor, cause the processor to perform acts comprising;
  
  identifying, based at least in part on the one or more audio signals, a request from a user to initiate a transaction;
  
  outputting, via the speaker, a request that the user utter a password associated with the user;
  
  determining, from the one or more audio signals, whether a first utterance of the user includes a password that matches the password associated with the user and whether an audio signature of the first utterance has a similarity score to an audio signature associated with the user that is greater than a first pre-defined threshold, the audio signature of the first utterance being based at least partly on a pitch, a decibel level, and a tone associated with the one or more audio signals;
  
  at least partly in response to determining that the passwords match and that the similarity score of the audio signature of the first utterance to the audio signature associated with the user is greater than the first pre-defined threshold, causing output of, via the speaker, a request that the user answer a pre-stored question having a previously selected answer;
  
  determining, from the one or more audio signals, whether a second utterance of the user includes the previously selected answer and whether an audio signature of the second utterance has a similarity score to the audio signature associated with the user that is greater than a second pre-defined threshold; and
  
  initiating the transaction at least partly in response to determining that the second utterance includes the previously selected answer and that the similarity score of the audio signature of the second utterance to the audio signature associated with the user is greater than the second pre-defined threshold.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The apparatus as recited in claim 1, the acts further comprising:
    - at least partly in response to determining that the second utterance of the user includes the previously selected answer and that the similarity score of the audio signature of the second utterance to the audio signature associated with the user is greater than the second pre-defined threshold, causing output of, via the speaker, a request that the user utter one or more particular words; and
      
      determining, from the one or more audio signals, whether a third utterance of the user includes the one or more particular words and whether an audio signature of the third utterance has a similarity score to the audio signature associated with the user that is greater than a third threshold;
      
      and wherein the initiating of the transaction also occurs at least partly in response to determining that the third utterance includes the one or more particular words and that the and that the similarity score of the audio signature of the third utterance to the audio signature associated with the user is greater than the third threshold.
  - 3. The apparatus as recited in claim 2, the acts further comprising denying the transaction at least partly in response to one or more of:
    - (1) determining that the third utterance does not include the one or more particular words, or (2) determining that the similarity score of the audio signature of the third utterance to the audio signature associated with the user is not greater than the third threshold.
  - 4. The apparatus as recited in claim 1, the acts further comprising denying the transaction at least partly in response to one or more of:
    - (1) determining that the first utterance does not include a password that matches the password associated with the user, or (2) determining that the similarity score of the audio signature of the first utterance to the audio signature associated with the user is not greater than the first pre-defined threshold.
  - 5. The apparatus as recited in claim 1, the acts further comprising denying the transaction at least partly in response to one or more of:
    - (1) determining that the second utterance does not include the answer previously selected by the user, or (2) determining that the similarity score of the audio signature of the second utterance to the audio signature associated with the user is not greater than the second pre-defined threshold.
  - 6. The apparatus as recited in claim 1, wherein the first pre-defined threshold and the second pre-defined threshold are a same.
  - 7. The apparatus as recited in claim 1, wherein the audio signature of the first utterance is based at least partly on a frequency associated with the one or more audio signals.

8. Non-transitory computer-readable media storing computer-executable instructions that, when executed on a processor, cause the processor to perform acts comprising:
- receiving a request from a user;
  
  causing output of a pre-stored question having a previously selected answer;
  
  determining, based at least in part on an audio signal, whether an answer audibly provided by the user matches the previously selected answer;
  
  determining whether a similarity score between an audio signature of the audio signal and an audio signature previously associated with the user meets or exceeds a pre-defined threshold, the audio signature of the audio signal being based at least partly on at least one of a pitch, a decibel level, or a tone associated with the audio signal; and
  
  initiating the request at least partly in response to determining that the answer audibly provided by the user matches the previously selected answer and that the similarity score meets or exceeds the pre-defined threshold.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The non-transitory computer-readable media as recited in claim 8, the acts further comprising;
    - receiving a second request from the user;
      
      causing output of a second pre-stored question having a second previously selected answer;
      
      determining, based at least in part on a second audio signal, whether a second answer audibly provided by the user matches the second previously selected answer;
      
      determining whether a second similarity score between a second audio signature of the second audio signal and the audio signature previously associated with the user meets or exceeds the pre-defined threshold; and
      
      denying the second request at least partly in response to determining that the second answer audibly provided by the user does not match the second previously selected answer or that the second similarity score does not meet or exceed the pre-defined threshold.
  - 10. The non-transitory computer-readable media as recited in claim 8, further comprising:
    - determining, from a second audio signal, whether a password spoken or spelled by the user matches a password previously associated with the user; and
      
      determining whether a second audio signature of the second audio signal substantially matches the audio signature previously associated with the user;
      
      and wherein the initiating also occurs at least partly in response to determining that the password spoken or spelled by the user matches the password previously associated with the user and that the second audio signature of the second audio signal substantially matches the audio signature previously associated with the user.
  - 11. The non-transitory computer-readable media as recited in claim 10, the acts further comprising denying the request at least partly in response to determining that the password spoken or spelled by the user does not match the password previously associated with the user or that the second audio signature of the second audio signal does not substantially match the audio signature previously associated with the user.
  - 12. The non-transitory computer-readable media as recited in claim 8, further comprising:
    - requesting that the user utter one or more particular words;
      
      determining whether a second audio signal includes a user utterance of the one or more particular words; and
      
      determining whether a second audio signature of the second audio signal substantially matches the audio signature previously associated with the user;
      
      and wherein the initiating also occurs at least partly in response to determining that the second audio signal includes a user utterance of the one or more particular words and that the second audio signature of the second audio signal substantially matches the audio signature previously associated with the user.
  - 13. The non-transitory computer-readable media as recited in claim 12, the acts further comprising denying the request at least partly in response to determining that the second audio signal does not include a user utterance of the one or more particular words or that the second audio signature of the second audio signal does not substantially match the audio signature previously associated with the user.
  - 14. The non-transitory computer-readable media as recited in claim 8, wherein the causing output of the pre-stored question is based at least partly on a preliminary identification of the user and comprises causing output of the pre-stored question via a speaker.

15. Non-transitory computer-readable media storing computer-executable instructions that, when executed on a processor, cause the processor to perform acts comprising:
- receiving a request from a user;
  
  requesting that the user utter one or more particular words;
  
  determining whether an audio signal includes a user utterance that includes the one or more particular words;
  
  determining whether a similarity score between an audio signature of the audio signal and an audio signature previously associated with the user meets or exceeds a pre-defined threshold, the audio signature of the audio signal being based at least partly on at least one of a pitch, a decibel level, or a tone associated with the audio signal; and
  
  initiating the request at least partly in response to determining that the audio signal includes a user utterance of the one or more particular words and that the similarity score meets or exceeds the pre-defined threshold.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The non-transitory computer-readable media as recited in claim 15, the acts further comprising denying the request at least partly in response to determining that the audio signal does not include a user utterance of the one or more particular words or that the similarity score does not meet or exceed the pre-defined threshold.
  - 17. The non-transitory computer-readable media as recited in claim 15, further comprising:
    - determining, from a second audio signal, whether a password spoken or spelled by the user matches a password previously associated with the user; and
      
      determining whether an audio signature of the second audio signal substantially matches the audio signature previously associated with the user;
      
      and wherein the initiating also occurs at least partly in response to determining that the password spoken or spelled by the user matches the password previously associated with the user and that the audio signature of the second audio signal substantially matches the audio signature previously associated with the user.
  - 18. The non-transitory computer-readable media as recited in claim 17, the acts further comprising denying the request at least partly in response to determining that the password spoken or spelled by the user does not match the password previously associated with the user or that the audio signature of the second audio signal does not substantially match the audio signature previously associated with the user.
  - 19. The non-transitory computer-readable media as recited in claim 15, further comprising:
    - causing output of a pre-stored question having a previously selected answer;
      
      determining, from a second audio signal, whether an answer audibly provided by the user matches the previously selected answer; and
      
      determining whether an audio signature of the second audio signal substantially matches an audio signature previously associated with the user;
      
      and wherein the initiating also occurs at least partly in response to determining that the answer audibly provided by the user matches the previously selected answer and that the audio signature of the second audio signal substantially matches the audio signature previously associated with the user.
  - 20. The non-transitory computer-readable media as recited in claim 19, the acts further comprising denying the request at least partly in response to determining that the answer audibly provided by the user does not match the previously selected answer or that the audio signature of the second audio signal does not substantially match the audio signature previously associated with the user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Narayanan, Preethi
Primary Examiner(s)
Pullias, Jesse

Application Number

US13/624,633
Time in Patent Office

1,271 Days
Field of Search

704246-250, 704/275
US Class Current

1/1
CPC Class Codes

G06F 21/32   using biometric data, e.g. ...

G10L 15/22   Procedures used during a sp...

G10L 17/06   Decision making techniques;...

G10L 17/22   Interactive procedures; Man...

G10L 17/24   the user being prompted to ...

User authentication for devices using voice input or audio signatures

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

User authentication for devices using voice input or audio signatures

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links