Processing spoken commands to control distributed audio outputs

US 10,262,657 B1
Filed: 11/21/2017
Issued: 04/16/2019
Est. Priority Date: 02/12/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving, from an input device, input data corresponding to an utterance;

determining, using at least one server device, that the input device corresponds to a first location;

determining, using the at least one server device, that an output system corresponds to the first location;

determining that the output system is outputting audio;

based at least in part on receiving the input data corresponding to the utterance and determining that the output system is outputting audio, sending, from the at least one server device to the output system, a first instruction to cause a decrease in volume of the audio;

after sending the first instruction, determining that the utterance has concluded; and

after determining the utterance has concluded, sending, to the output system, a second instruction indicating the utterance has concluded.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system that is capable of controlling multiple entertainment systems and/or speakers using voice commands. The system receives voice commands and may determine audio sources and speakers indicated by the voice commands. The system may generate audio data from the audio sources and may send the audio data to the speakers using multiple interfaces. For example, the system may send the audio data directly to the speakers using a network address, may send the audio data to the speakers via a voice-enabled device or may send the audio data to the speakers via a speaker controller. The system may generate output zones including multiple speakers and may associate input devices with speakers within the output zones. For example, the system may receive a voice command from an input device in an output zone and may reduce output audio generated by speakers in the output zone.

39 Citations

View as Search Results

20 Claims

1. A computer-implemented method comprising:
- receiving, from an input device, input data corresponding to an utterance;
  
  determining, using at least one server device, that the input device corresponds to a first location;
  
  determining, using the at least one server device, that an output system corresponds to the first location;
  
  determining that the output system is outputting audio;
  
  based at least in part on receiving the input data corresponding to the utterance and determining that the output system is outputting audio, sending, from the at least one server device to the output system, a first instruction to cause a decrease in volume of the audio;
  
  after sending the first instruction, determining that the utterance has concluded; and
  
  after determining the utterance has concluded, sending, to the output system, a second instruction indicating the utterance has concluded.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The computer-implemented method of claim 1, further comprising:
    - after determining the utterance has concluded, prior to sending the second instruction, performing speech processing on the input data to identify a command.
  - 3. The computer-implemented method of claim 2, further comprising:
    - prior to sending the second instruction, sending, to the input device, a third instruction to output second audio corresponding to an acknowledgement of the command.
  - 4. The computer-implemented method of claim 2, further comprising:
    - determining output audio data corresponding to the command; and
      
      prior to sending the second instruction, sending the output audio data to the output system.
  - 5. The computer-implemented method of claim 1, wherein:
    - the output system comprises an audio output device at the first location,sending the first instruction comprises sending the first instruction to the audio output device, andsending the second instruction comprises sending the second instruction to the audio output device.
  - 6. The computer-implemented method of claim 1, wherein the output system comprises an audio output device at the first location and a controller device, the method further comprising:
    - determining that the controller device controls the audio output device, wherein;
      
      sending the first instruction comprises sending the first instruction to the controller device, andsending the second instruction comprises sending the second instruction to the controller device.
  - 7. The computer-implemented method of claim 1, wherein determining that the output system corresponds to the first location further comprises:
    - determining that second audio data corresponding to the audio is present in the input data.
  - 8. The computer-implemented method of claim 1, further comprising:
    - generating a third instruction having a first format associated with at least one server device;
      
      sending a request to an application programming interface to translate the third instruction to the first instruction, the first instruction corresponding to a format associated with the output system; and
      
      receiving the first instruction from the application programming interface.
  - 9. The computer-implemented method of claim 1, further comprising:
    - prior to sending the second instruction, determining that a speech-recognition accuracy corresponding to the utterance is below an accuracy threshold;
      
      determining output audio data corresponding to request to repeat at least a portion of the utterance; and
      
      prior to sending the second instruction, sending, to the input device, the output audio data.
  - 10. The computer-implemented method of claim 9, wherein the input data comprises at least one of:
    - an indication of detection of a wakeword; and
      
      input audio data representing the utterance.

11. A system comprising:
- at least one processor; and
  
  at least one memory including instructions that, when executed by the at least one processor, cause the system to;
  
  receive, from an input device, input data corresponding to an utterance;
  
  determine that the input device corresponds to a first location;
  
  determine that an output system corresponds to the first location;
  
  determine that the output system is outputting audio;
  
  based at least in part on receiving the input data corresponding to the utterance and determining that the output system is outputting audio, sending, to the output system, a first instruction to cause a decrease in volume of the audio;
  
  after sending the first instruction, determine that the utterance has concluded; and
  
  after determining the utterance has concluded, send, to the output system, a second instruction indicating the utterance has concluded.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The system of claim 11, wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to:
    - after determining the utterance has concluded, prior to sending the second instruction, perform speech processing on the input data to identify a command.
  - 13. The system of claim 12, wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to:
    - prior to sending the second instruction, send, to the input device, a third instruction to output second audio corresponding to an acknowledgement of the command.
  - 14. The system of claim 12, wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to:
    - determine output audio data corresponding to the command; and
      
      prior to sending the second instruction, send the output audio data to the output system.
  - 15. The system of claim 11, wherein:
    - the output system comprises an audio output device at the first location,the instructions that cause the system to send the first instruction further comprise instructions that cause the system to send the first instruction to the audio output device, andthe instructions that cause the system to send the second instruction further comprise instructions that cause the system to send the second instruction to the audio output device.
  - 16. The system of claim 11, wherein the output system comprises an audio output device at the first location and a controller device, and wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to:
    - determine that the controller device controls the audio output device,wherein the instructions that cause the system to send the first instruction further comprise instructions that cause the system to send the first instruction to the controller device, andwherein the instructions that cause the system to send the second instruction further comprise instructions that cause the system to send the second instruction to the controller device.
  - 17. The system of claim 11, wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to:
    - determine that second audio data corresponding to the audio is present in the input data.
  - 18. The system of claim 11, wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to:
    - generate a third instruction having a first format associated with at least one server device;
      
      send a request to an application programming interface to translate the third instruction to the first instruction, the first instruction corresponding to a format associated with the output system; and
      
      receive the first instruction from the application programming interface.
  - 19. The system of claim 11, wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to:
    - prior to sending the second instruction, determine that a speech-recognition accuracy corresponding to the utterance is below an accuracy threshold;
      
      determining output audio data corresponding to request to repeat at least a portion of the utterance; and
      
      prior to sending the second instruction, send, to the input device, the output audio data.
  - 20. The system of claim 19, wherein the input data comprises at least one of:
    - an indication of detection of a wakeword; and
      
      input audio data representing the utterance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Williams, Robert, Rabuchin, Steven Todd, Hart, Gregory Michael
Primary Examiner(s)
Nguyen, Khai N.

Application Number

US15/819,502
Time in Patent Office

511 Days
Field of Search

381 59, 381 85, 381334
US Class Current
CPC Class Codes

G06F 16/68   Retrieval characterised by ...

G06F 3/165   Management of the audio str...

G06F 40/40   Processing or translation o...

G10L 13/02   Methods for producing synth...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

G10L 21/0364   for improving intelligibility

H04R 2420/07   Applications of wireless lo...

H04R 2430/01   Aspects of volume control, ...

H04R 3/12   for distributing signals to...

Processing spoken commands to control distributed audio outputs

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

39 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Processing spoken commands to control distributed audio outputs

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

39 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links