CONTEXT-BASED DEVICE ARBITRATION

US 20190066670A1
Filed: 08/30/2017
Published: 02/28/2019
Est. Priority Date: 08/30/2017
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

one or more processors;

computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising;

receiving, from a first voice-enabled device, first audio data representing a speech utterance;

receiving, from the first voice-enabled device, a first audio signal metric value indicating a first signal-to-noise ratio associated with the first audio data;

receiving, from a second voice-enabled device, second audio data representing the speech utterance;

receiving, from the second voice-enabled device, a second audio signal metric value indicating a second signal-to-noise ratio associated with the second audio data;

determining that the first signal-to-noise ratio is greater than the second signal-to-noise ratio;

identifying device state data associated with the first voice-enabled device;

generating, using automatic speech recognition (ASR) on at least one of the first audio data or the second audio data, text data corresponding to the speech utterance;

determining, using natural language understanding (NLU) on the text data, intent data associated with the speech utterance, the intent data representing a request for a client device to perform an action;

determining, based at least in part on the device state data, that the first voice-enabled device is capable of performing the action responsive to the speech utterance;

determining a command to cause the first voice-enabled device to perform the action; and

sending, to the first voice-enabled device, data indicating the command.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This disclosure describes, in part, context-based device arbitration techniques to select a voice-enabled device from multiple voice-enabled devices to provide a response to a command included in a speech utterance of a user. In some examples, the context-driven arbitration techniques may include determining a ranked list of voice-enabled devices that are ranked based on audio signal metric values for audio signals generated by each voice-enabled device, and iteratively moving through the list to determine, based on device states of the voice-enabled devices, whether one of the voice-enabled devices can perform an action responsive to the command. If the voice-enabled devices that detected the speech utterance are unable to perform the action responsive to the command, all other voice-enabled devices associated with an account may be analyzed to determine whether one of the other voice-enabled devices can perform the action responsive to the command in the speech utterance.

69 Citations

View as Search Results

20 Claims

1. A system comprising:
- one or more processors;
  
  computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising;
  
  receiving, from a first voice-enabled device, first audio data representing a speech utterance;
  
  receiving, from the first voice-enabled device, a first audio signal metric value indicating a first signal-to-noise ratio associated with the first audio data;
  
  receiving, from a second voice-enabled device, second audio data representing the speech utterance;
  
  receiving, from the second voice-enabled device, a second audio signal metric value indicating a second signal-to-noise ratio associated with the second audio data;
  
  determining that the first signal-to-noise ratio is greater than the second signal-to-noise ratio;
  
  identifying device state data associated with the first voice-enabled device;
  
  generating, using automatic speech recognition (ASR) on at least one of the first audio data or the second audio data, text data corresponding to the speech utterance;
  
  determining, using natural language understanding (NLU) on the text data, intent data associated with the speech utterance, the intent data representing a request for a client device to perform an action;
  
  determining, based at least in part on the device state data, that the first voice-enabled device is capable of performing the action responsive to the speech utterance;
  
  determining a command to cause the first voice-enabled device to perform the action; and
  
  sending, to the first voice-enabled device, data indicating the command.
- View Dependent Claims (2, 3, 4)
- - 2. The system of claim 1, the operations further comprising causing the second voice-enabled device to stop transmitting the second audio data, the second voice-enabled device being stopped from transmitting the second audio data prior to the first voice-enabled device stopping transmitting the first audio data,wherein generating the text data is performed using ASR on the first audio data.
  - 3. The system of claim 1, the operations further comprising:
    - determining that the first voice-enabled device is included in a stored grouping of devices that includes the first voice-enabled device and a third voice-enabled device;
      
      identifying device state data associated with the stored grouping of devices; and
      
      determining that the stored grouping of devices is capable of performing the action responsive to the speech utterance.
  - 4. The system of claim 1, wherein identifying the device state data associated with the first voice-enabled device comprises:
    - sending a request to an event component to provide an indication of the device state data associated with the first voice-enabled device; and
      
      receiving, from the event component, the device state data.

5. A system comprising:
- one or more processors;
  
  computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising;
  
  receiving a first device identifier of a first device;
  
  receiving first audio data associated with the first device identifier, the first audio data representing a sound;
  
  receiving a second device identifier of a second device;
  
  receiving second audio data associated with the second device identifier, the second audio data representing a portion of the sound, the portion of the sound being less than all the sound represented by the first audio data;
  
  receiving intent data representing a machine response to the sound;
  
  identifying first device state data associated with the first device;
  
  identifying second device state data associated with the second device; and
  
  based at least in part on the second device state data, determining the second device is to be used for the machine response.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
- - 6. The system of claim 5, further comprising determining, based on the first device state data, that the first device is offline.
  - 7. The system of claim 5, the operations further comprising:
    - determining that the first device is included in a stored grouping of devices that includes the first device and a third device;
      
      identifying device state data associated with the stored grouping of devices; and
      
      determining, based on the device state data associated with the stored grouping of devices, that the stored grouping of devices is offline.
  - 8. The system of claim 5, the operations further comprising:
    - determining that the first device is associated with a secondary device;
      
      identifying third device state data associated with the secondary device; and
      
      determining, based on the third device state data, that the secondary device is offline.
  - 9. The system of claim 5, the operations further comprising:
    - determining, based on the first device state data, that the first device is offline;
      
      storing an indication that the second device is to perform the machine response;
      
      determining a command to cause the second device to perform the machine response; and
      
      sending, to the second device, data indicating the command to perform the machine response.
  - 10. The system of claim 5, the operations further comprising receiving an indication that the first device is ranked higher than the second device based at least in part on a first audio signal metric associated with the first audio data and a second audio signal metric associated with the second audio data
  - 11. The system of claim 10, wherein:
    - the first audio signal metric associated with the first audio data comprises at least one of;
      
      a first signal-to-noise value of the first audio data;
      
      a first amplitude of the first audio data;
      
      or a first level of voice activity in the first audio data; and
      
      the second audio signal metric associated with the second audio data comprises at least one of;
      
      a second signal-to-noise value of the second audio data;
      
      a second amplitude of the second audio data;
      
      ora second level of voice activity in the second audio data.
  - 12. The system of claim 5, the operations further comprising receiving an indication that the first device is ranked higher than the second device, wherein the first device and the second device are ranked based on one or more of:
    - input received via an input control of the first device;
      
      a distance of a user to the first device;
      
      orimage data indicating that the user is at least partially facing the first device.

13. A method comprising:
- receiving first audio data associated with a first device, the first audio data representing sound;
  
  receiving second audio data associated with a second device, the second audio data representing a portion of the sound that is less than all the sound represented in the first audio data;
  
  identifying first device state data associated with the first device;
  
  identifying second device state data associated with the second device;
  
  receiving intent data representing a machine response to the sound; and
  
  based at least in part on the second device state data, determining the second device is to be used for the machine response.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The method of claim 13, further comprising determining, based on the first device state data, that the first device is offline.
  - 15. The method of claim 13, further comprising:
    - determining that the first device is included in a stored grouping of devices that includes the first device and a third device;
      
      identifying device state data associated with the stored grouping of devices; and
      
      determining, based on the device state data associated with the stored grouping of devices, that the stored grouping of devices is offline.
  - 16. The method of claim 13, further comprising:
    - determining that the first device is associated with a secondary device;
      
      identifying third device state data associated with the secondary device; and
      
      determining, based on the third device state data, that the secondary device is offline.
  - 17. The method of claim 13, further comprising:
    - determining, based on the first device state data, that the first device is offline;
      
      storing an indication that the second device is to perform the machine response;
      
      determining a command to cause the second device to perform the machine response; and
      
      sending, to the second device, data indicating the command to perform the machine response.
  - 18. The method of claim 13, further comprising receiving an indication that the first device is ranked higher than the second device based at least in part on a first audio signal metric associated with the first audio data and a second audio signal metric associated with the second audio data.
  - 19. The method of claim 18, wherein:
    - the first audio signal metric associated with the first audio data comprises at least one of;
      
      a first signal-to-noise value of the first audio data;
      
      a first amplitude of the first audio data;
      
      ora first level of voice activity in the first audio data; and
      
      the second audio signal metric associated with the second audio data comprises at least one of;
      
      a second signal-to-noise value of the second audio data;
      
      a second amplitude of the second audio data;
      
      ora second level of voice activity in the second audio data.
  - 20. The method of claim 13, further comprising:
    - generating output audio data representing synthesized speech of output text data, wherein the output text data indicates that the second device is to be used for the machine response; and
      
      sending, to the first device, the output audio data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
White, Joseph, Rachakonda, Ravi Kiran, Mohanam, Vinodth Kumar, Rajendran, Lalithkumar, Shah, Deepak Uttam, Khorasani, Maziyar, Cherukuri, Venkata Snehith

Granted Patent

US 10,546,583 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 15/28   Constructional details of s...

G10L 2015/223   Execution procedure of a sp...

G10L 25/84   for discriminating voice fr...

CONTEXT-BASED DEVICE ARBITRATION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

69 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

CONTEXT-BASED DEVICE ARBITRATION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

69 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links