User recognition for speech processing systems

US 10,032,451 B1
Filed: 12/20/2016
Issued: 07/24/2018
Est. Priority Date: 12/20/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving, from a speech-controlled device, input audio data corresponding to an utterance;

performing automatic speech recognition (ASR) on the input audio data to determine input text data;

performing natural language understanding (NLU) on the input text data to determine NLU results data;

determining a user profile associated with the speech-controlled device, the user profile associated with first training data representing how a first user'"'"'s voice sounds and second training data representing how a second user'"'"'s voice sounds;

determining, using at least the input audio data and the first training data, a first user recognition confidence score corresponding to a likelihood that the utterance was spoken by the first user;

determining a first device to receive first text data from, the first text data being responsive to a first portion of the NLU results data;

receiving a request from the first device for the first user recognition confidence score;

sending, to the first device, a speech session identifier and the first user recognition confidence score, wherein the speech session identifier corresponds to the utterance;

receiving the first text data from the first device; and

sending, to the speech-controlled device, output data corresponding to the first text data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, methods, and devices for recognizing a user are disclosed. A speech-controlled device captures a spoken utterance, and sends audio data corresponding thereto to a server. The server determines content sources storing or having access to content responsive to the spoken utterance. The server also determines multiple users associated with a profile of the speech-controlled device. Using the audio data, the server may determine user recognition data with respect to each user indicated in the speech-controlled device'"'"'s profile. The server may also receive user recognition confidence threshold data from each of the content sources. The server may determine user recognition data associated that satisfies (i.e., meets or exceeds) a most stringent (i.e., highest) of the user recognition confidence threshold data. Thereafter, the server may send data indicating a user associated with the user recognition data to all of the content sources.

Citations

20 Claims

1. A computer-implemented method comprising:
- receiving, from a speech-controlled device, input audio data corresponding to an utterance;
  
  performing automatic speech recognition (ASR) on the input audio data to determine input text data;
  
  performing natural language understanding (NLU) on the input text data to determine NLU results data;
  
  determining a user profile associated with the speech-controlled device, the user profile associated with first training data representing how a first user'"'"'s voice sounds and second training data representing how a second user'"'"'s voice sounds;
  
  determining, using at least the input audio data and the first training data, a first user recognition confidence score corresponding to a likelihood that the utterance was spoken by the first user;
  
  determining a first device to receive first text data from, the first text data being responsive to a first portion of the NLU results data;
  
  receiving a request from the first device for the first user recognition confidence score;
  
  sending, to the first device, a speech session identifier and the first user recognition confidence score, wherein the speech session identifier corresponds to the utterance;
  
  receiving the first text data from the first device; and
  
  sending, to the speech-controlled device, output data corresponding to the first text data.
- View Dependent Claims (2, 3)
- - 2. The computer-implemented method of claim 1, further comprising:
    - receiving, from the speech-controlled device, second input audio data corresponding to a second utterance;
      
      performing ASR on the second input audio data to determine second input text data;
      
      performing NLU on the second input text data to determine second NLU results data;
      
      determining, using at least the second input audio data and the first training data, a second user recognition confidence score corresponding to a likelihood that the second utterance was spoken by the first user;
      
      determining the first device to receive second text data from, the second text data being responsive to a first portion of the second NLU results data;
      
      receiving a second request from the first device for the second user recognition confidence score, the second request including a first user recognition confidence threshold associated with the first device;
      
      determining that the second user recognition confidence score does not satisfy the first user recognition confidence threshold;
      
      sending, to the speech-controlled device, text-to-speech (TTS) data representing speech requesting further input audio data corresponding to a further utterance;
      
      receiving, from the speech-controlled device, third input audio data;
      
      determining, using the third input audio data, a third user recognition confidence score;
      
      determining that the third user recognition confidence score satisfies the first user recognition confidence threshold; and
      
      sending, to the first device, the third user recognition confidence score.
  - 3. The computer-implemented method of claim 1, wherein the user profile is further associated with a second user associated with second training data, and the computer-implemented method further comprises:
    - determining, using at least the input audio data and the second training data, a second user recognition confidence score corresponding to a likelihood that the input audio data was spoken by the second user;
      
      receiving, from the first device, a user recognition confidence threshold associated with the first device;
      
      determining that the first user recognition confidence score satisfied the user recognition confidence threshold;
      
      determining the second user recognition confidence score satisfies the user recognition confidence threshold;
      
      sending, to the speech-controlled device, text-to-speech (TTS) data representing speech requesting further input audio data corresponding to a further utterance;
      
      receiving, from the speech-controlled device, second input audio data;
      
      determining, using the second input audio data, a third user recognition confidence score corresponding to the first user recognition confidence score;
      
      determining, using the second input audio data, a fourth user recognition confidence score corresponding to the second user recognition confidence score; and
      
      determining that the third user recognition confidence score is greater than the fourth user recognition confidence score.

4. A system comprising:
- at least one processor; and
  
  memory including instructions that, when executed, cause the at least one processor to;
  
  receive, from a device, input audio data corresponding to an utterance;
  
  perform automatic speech recognition (ASR) on the input audio data to create input text data;
  
  perform natural language understanding (NLU) on the input text data to create NLU results data;
  
  determine a user profile associated with the device, the user profile associated with first user-specific data corresponding to how a first user'"'"'s voice sounds;
  
  determine, using at least the first user-specific data, a first user recognition score corresponding to a likelihood that the utterance was spoken by the first user; and
  
  send, to at least one remote device, first data corresponding to the first user recognition score.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12)
- - 5. The system of claim 4, wherein the instructions further cause the at least one processor to:
    - determine the NLU results data correspond to a first command; and
      
      determine the first command is associated with a first remote device,wherein the first command is associated with a first user recognition confidence threshold.
  - 6. The system of claim 4, wherein the instructions further cause the at least one processor to:
    - determine a first remote device having access to first text data responsive to at least a first portion of the NLU results data; and
      
      determine a second remote device having access to second text data responsive to at least a second portion of the NLU results data,wherein the instructions that configure the processor to send the first data further include instructions to send the first data to the first remote device and the second remote device.
  - 7. The system of claim 6, wherein the instructions further cause the at least one processor to:
    - determine a first user recognition confidence threshold corresponding to the first remote device; and
      
      determine that the first user recognition score is greater than the first user recognition confidence threshold.
  - 8. The system of claim 6, wherein the instructions further cause the at least one processor to:
    - receive the first text data from the first remote device;
      
      receive the second text data from the second remote device; and
      
      send, to the device, output data corresponding to the first text data and the second text data.
  - 9. The system of claim 4, wherein the instructions further cause the at least one processor to:
    - determine a first user recognition confidence threshold corresponding to a first remote device; and
      
      determine that the first user recognition score does not satisfy the first user recognition confidence threshold,wherein the instructions causing the at least one processor to send the first data further include instructions to send, to the first remote device, an indicator that an unknown user spoke the utterance.
  - 10. The system of claim 4, wherein the user profile is further associated with second user-specific data corresponding to a second user, and wherein the instructions further cause the at least one processor to:
    - determine, using at least the second user-specific data, a second user recognition score corresponding to a second likelihood that the utterance was spoken by the second user;
      
      determine a first user recognition confidence threshold corresponding to a first remote device;
      
      determine that the first user recognition score satisfies the first user recognition confidence threshold;
      
      determine that the second user recognition score satisfies the first user recognition confidence threshold;
      
      receive, from the device, second input audio data;
      
      determine, based at least in part on the second input audio data, a third user recognition score corresponding to the first user recognition score;
      
      determine, based at least in part on the second input audio data, a fourth user recognition score corresponding to the second user recognition score; and
      
      determine that the third user recognition score is greater than the fourth user recognition score.
  - 11. The system of claim 4, wherein the instructions further cause the at least one processor to:
    - receive, from the device, second input audio data corresponding to a second utterance;
      
      perform ASR on the second input audio data to create second input text data;
      
      perform NLU on the second input text data to create second NLU results data;
      
      determine a first remote device configured to provide access to text data responsive to the second NLU results data;
      
      determine a first user recognition confidence threshold corresponding to the first remote device;
      
      determine, using at least the first user-specific data, a second user recognition score corresponding to a likelihood that the second utterance was spoken by the first user;
      
      determine that the second user recognition score does not satisfy the first user recognition confidence threshold;
      
      receive, from the device, third input audio data;
      
      determine, based on the third input audio data, a third user recognition score corresponding to the second user recognition score;
      
      determine that the third user recognition score satisfies the first user recognition confidence threshold; and
      
      send, to the first remote device, second data corresponding to the third user recognition score.
  - 12. The system of claim 4, wherein the instructions further cause the at least one processor to:
    - determine a first user recognition confidence threshold by receiving the first user recognition confidence threshold from a first remote device.

13. A computer-implemented method comprising:
- receiving, from a device, input audio data corresponding to an utterance;
  
  performing automatic speech recognition (ASR) on the input audio data to create input text data;
  
  performing natural language understanding (NLU) on the input text data to create NLU results data;
  
  determining a user profile associated with the device, the user profile associated with first user-specific data corresponding to a first user;
  
  determining, using at least the first user-specific data, a first user recognition score corresponding to a likelihood that the utterance was spoken by the first user; and
  
  sending, to at least one remote device, first data corresponding to the first user recognition score.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The computer-implemented method of claim 13, further comprising:
    - determining the NLU results data corresponds to a first command; and
      
      determining the first command is associated with a first remote device,wherein the first command is associated with a first user recognition confidence threshold.
  - 15. The computer-implemented method of claim 13, further comprising:
    - determining a first remote device having access to first text data responsive to at least a first portion of the NLU results data; and
      
      determining a second remote device having access to second text data responsive to at least a second portion of the NLU results data,wherein sending the first data comprises sending the first data to the first remote device and the second remote device.
  - 16. The computer-implemented method of claim 15, further comprising:
    - determining a first user recognition confidence threshold corresponding to the first remote device; and
      
      determining that the first user recognition score is greater than the first user recognition confidence threshold.
  - 17. The computer-implemented method of claim 13, further comprising:
    - receiving the first text data from the first remote device;
      
      receiving the second text data from the second remote device; and
      
      sending, to the device, output data corresponding to the first text data and the second text data.
  - 18. The computer-implemented method of claim 13, further comprising:
    - determining a first user recognition confidence threshold corresponding to a first remote device; and
      
      determining that the first user recognition score does not satisfy the first user recognition confidence threshold,wherein sending the first data comprises sending, to the first remote device, an indicator that an unknown user spoke the utterance.
  - 19. The computer-implemented method of claim 13, wherein the user profile is further associated with second user-specific data corresponding to a second user, and wherein the method further comprises:
    - determining, using at least the second user-specific data, a second user recognition score corresponding to a second likelihood that the utterance was spoken by the second user;
      
      determining a first user recognition confidence threshold corresponding to a first remote device;
      
      determining that the first user recognition score satisfies the first user recognition confidence threshold;
      
      determining that the second user recognition score satisfies the first user recognition confidence threshold;
      
      receiving, from the device, second input audio data;
      
      determining, based at least in part on the second input audio data, a third user recognition score corresponding to the first user recognition score;
      
      determining, based at least in part on the second input audio data, a fourth user recognition score corresponding to the second user recognition score; and
      
      determining that the third user recognition score is greater than the fourth user recognition score.
  - 20. The computer-implemented method of claim 13, further comprising:
    - receiving, from the device, second input audio data corresponding to a second utterance;
      
      performing ASR on the second input audio data to create second input text data;
      
      performing NLU on the second input text data to create second NLU results data;
      
      determining a first remote device configured to provide access to text data responsive to the second NLU results data;
      
      determining a first user recognition confidence threshold corresponding to the first remote device;
      
      determining, using at least the first user-specific data, a second user recognition score corresponding to a likelihood that the second utterance was spoken by the first user;
      
      determining that the second user recognition score does not satisfy the first user recognition confidence threshold;
      
      receiving, from the device, third input audio data;
      
      determining, based on the third input audio data, a third user recognition score corresponding to the second user recognition score;
      
      determining that the third user recognition score satisfies the first user recognition confidence threshold; and
      
      sending, to the first remote device, second data corresponding to the third user recognition score.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Mamkina, Natalia Vladimirovna, Bancroft, Naomi, Kumar, Nishant, Somashekar, Shamitha
Primary Examiner(s)
Chawan, Vijay B

Application Number

US15/385,138
Time in Patent Office

581 Days
Field of Search

704235, 7042701, 704240, 704243, 704251, 704275, 704278, 704 9, 345156, 706 11
US Class Current
CPC Class Codes

G06F 21/32   using biometric data, e.g. ...

G06F 3/167   Audio in a user interface, ...

G10L 15/01   Assessment or evaluation of...

G10L 15/18   using natural language mode...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 17/06   Decision making techniques;...

User recognition for speech processing systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

User recognition for speech processing systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links