Controlling distributed audio outputs to enable voice output

US 9,898,250 B1
Filed: 06/29/2016
Issued: 02/20/2018
Est. Priority Date: 02/12/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for controlling a speaker system, the method comprising:

associating, by at least one server device, an audio device with a first wireless speaker;

receiving, from the audio device by the at least one server device, input audio data corresponding to an utterance;

performing, by the at least one server device, speech processing on the input audio data to determine a first instruction;

generating, by the at least one server device, voice output audio data that includes synthesized speech corresponding to the first instruction;

determining that the first wireless speaker is associated with the audio device;

determining that the first wireless speaker is outputting first audio;

sending a second instruction to the network-connected device to cause the network-connected device to reduce a volume level of the first audio from a first level to a second level; and

sending, by the at least one server device, the voice output audio data to the audio device to cause the audio device to generate, while the first audio is outputting at the second level, second audio using a speaker.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system that is capable of controlling multiple entertainment systems and/or speakers using voice commands. The system receives voice commands and may determine speakers playing output audio in proximity to the voice commands. The system may generate voice output and send the voice output to the speakers, along with a command to reduce a volume of output audio while playing the voice output. For example, the system may receive a voice command from an input device associated with an output zone, may reduce output audio generated by speakers in the output zone and may play the voice output via the speakers. In addition, the system may send the command to the speakers while sending the voice output to another device for playback. For example, the system may reduce output audio generated by the speakers and play the voice output via the input device.

Citations

18 Claims

1. A computer-implemented method for controlling a speaker system, the method comprising:
- associating, by at least one server device, an audio device with a first wireless speaker;
  
  receiving, from the audio device by the at least one server device, input audio data corresponding to an utterance;
  
  performing, by the at least one server device, speech processing on the input audio data to determine a first instruction;
  
  generating, by the at least one server device, voice output audio data that includes synthesized speech corresponding to the first instruction;
  
  determining that the first wireless speaker is associated with the audio device;
  
  determining that the first wireless speaker is outputting first audio;
  
  sending a second instruction to the network-connected device to cause the network-connected device to reduce a volume level of the first audio from a first level to a second level; and
  
  sending, by the at least one server device, the voice output audio data to the audio device to cause the audio device to generate, while the first audio is outputting at the second level, second audio using a speaker.
- View Dependent Claims (2, 3, 4)
- - 2. The computer-implemented method of claim 1, further comprising:
    - determining that the audio device has finished generating the second audio using the speaker; and
      
      sending a third instruction to the network-connected device to cause the network-connected device to increase the volume level of the first audio from the second level to the first level.
  - 3. The computer-implemented method of claim 1, further comprising:
    - determining an output zone associated with the audio device, the output zone including at least the first wireless speaker; and
      
      sending the second instruction, the second instruction indicating the output zone and causing the network-connected device to;
      
      determine one or more wireless speakers corresponding to the output zone, the one or more wireless speakers including the first wireless speaker, andcause the volume level of the first audio to be reduced, by the one or more wireless speakers, from the first level to the second level.
  - 4. The computer-implemented method of claim 1, further comprising:
    - determining a first location of the audio device; and
      
      sending the second instruction, the second instruction indicating the first location and causing the network-connected device to;
      
      determine one or more wireless speakers corresponding to the first location, andcause the volume level of the first audio to be reduced, by the one or more wireless speakers, from the first level to the second level.

5. A computer-implemented method comprising:
- receiving, from an audio device by at least one server device, input audio data corresponding to an utterance;
  
  performing, by the at least one server device, speech processing on the input audio data to determine a first instruction;
  
  generating, by the at least one server device, voice output audio data that includes synthesized speech corresponding to the first instruction;
  
  determining a first output device associated with the audio device, the first output device controllable by a network-connected device;
  
  determining that the first output device is outputting first audio;
  
  sending a second instruction to the network-connected device to cause the network-connected device to reduce a volume level of the first audio from a first level to a second level; and
  
  sending, by the at least one server device, the voice output audio data to the audio device to cause the audio device to output, while the first audio is outputting at the second level, second audio generated from the voice output audio data.
- View Dependent Claims (6, 7, 8, 9, 10, 11)
- - 6. The computer-implemented method of claim 5, further comprising:
    - determining that the audio device has finished outputting the second audio; and
      
      sending a third instruction to the network-connected device to cause the network-connected device to increase the volume level of the first audio from the second level to the first level.
  - 7. The computer-implemented method of claim 5, further comprising:
    - generating an address identifier associated with the voice output audio data;
      
      sending the address identifier to the network-connected device; and
      
      sending a third instruction to the network-connected device, the third instruction causing the network-connected device to;
      
      cause a volume level of the first audio to be reduced, by the first output device, from a first level to a second level,obtain the voice output audio data using the address identifier,cause the second audio to be generated, from the voice output audio data via the first output device, at the first level, andcause the volume level of the first audio to be increased, by the first output device, from the second level to the first level.
  - 8. The computer-implemented method of claim 5, further comprising:
    - determining a first location associated with the audio device;
      
      determining one or more output devices, including the first output device, that are in proximity to the first location; and
      
      associating the one or more output devices with the audio device.
  - 9. The computer-implemented method of claim 5, further comprising:
    - determining an output zone associated with the audio device, the output zone including at least the first output device; and
      
      sending the second instruction, the second instruction indicating the output zone and causing the network-connected device to;
      
      determine one or more output devices corresponding to the output zone, the one or more output devices including the first output device, andcause the volume level of the first audio to be reduced, by the one or more output devices, from a first level to a second level.
  - 10. The computer-implemented method of claim 5, further comprising:
    - determining a first location associated with the audio device; and
      
      sending the second instruction, the second instruction indicating the first location and causing the network-connected device to;
      
      determine one or more output devices in proximity to the first location, the one or more output devices including the first output device, andcause the volume level of the first audio to be reduced, by the one or more output devices, from a first level to a second level.
  - 11. The computer-implemented method of claim 5, further comprising:
    - generating a third instruction that has a first format associated with the at least one server device;
      
      sending a request to an application programming interface to translate the third instruction from the first format to a second format associated with the network-connected device; and
      
      receiving the second instruction from the application programming interface, the second instruction having the second format.

12. A system, comprising:
- at least one processor;
  
  a memory including instructions operable to be executed by the at least one processor to configure the system to;
  
  receive, from an audio device, input audio data corresponding to an utterance;
  
  perform speech processing on the input audio data to determine a first instruction;
  
  generate voice output audio data that includes synthesized speech corresponding to the first instruction;
  
  determine a first output device associated with the audio device, the first output device controllable by a network-connected device;
  
  determine that the first output device is outputting first audio;
  
  send a second instruction to the network-connected device to cause the network-connected device to reduce a volume level of the first audio from a first level to a second level; and
  
  send the voice output audio data to the audio device to cause the audio device to output, while the first audio is outputting at the second level, second audio generated from the voice output audio data.
- View Dependent Claims (13, 14, 15, 16, 17, 18)
- - 13. The system of claim 12, wherein the instructions further configure the system to:
    - determine that the audio device has finished outputting the second audio; and
      
      send a third instruction to the network-connected device to cause the network-connected device to increase the volume level of the first audio from the second level to the first level.
  - 14. The system of claim 12, wherein the instructions further configure the system to:
    - generate an address identifier associated with the voice output audio data;
      
      send the address identifier to the network-connected device; and
      
      send a third instruction to the network-connected device, the third instruction causing the network-connected device to;
      
      cause a volume level of the first audio to be reduced, by the first output device, from a first level to a second level,obtain the voice output audio data using the address identifier,cause the second audio to be generated, from the voice output audio data via the first output device, at the first level, andcause the volume level of the first audio to be increased, by the first output device, from the second level to the first level.
  - 15. The system of claim 12, wherein the instructions further configure the system to:
    - determine a first location associated with the audio device;
      
      determine one or more output devices, including the first output device, that are in proximity to the first location; and
      
      associate the one or more output devices with the audio device.
  - 16. The system of claim 12, wherein the instructions further configure the system to:
    - determine an output zone associated with the audio device, the output zone including at least the first output device; and
      
      send the second instruction, the second instruction indicating the output zone and causing the network-connected device to;
      
      determine one or more output devices corresponding to the output zone, the one or more output devices including the first output device, andcause the volume level of the first audio to be reduced, by the one or more output devices, from a first level to a second level.
  - 17. The system of claim 12, wherein the instructions further configure the system to:
    - determine a first location associated with the audio device; and
      
      send the second instruction, the second instruction indicating the first location and causing the network-connected device to;
      
      determine one or more output devices in proximity to the first location, the one or more output devices including the first output device, andcause the volume level of the first audio to be reduced, by the one or more output devices, from a first level to a second level.
  - 18. The system of claim 12, wherein the instructions further configure the system to:
    - generate a third instruction that has a first format associated with the at least one server device;
      
      send a request to an application programming interface to translate the third instruction from the first format to a second format associated with the network-connected device; and
      
      receive the second instruction from the application programming interface, the second instruction having the second format.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Williams, Robert, Rabuchin, Steven Todd, Hart, Gregory Michael
Primary Examiner(s)
Nguyen, Khai N

Application Number

US15/196,324
Time in Patent Office

601 Days
Field of Search

381 59, 381 85, 381334
US Class Current
CPC Class Codes

G06F 16/685   using automatically derived...

G06F 16/687   using geographical or spati...

G06F 16/955   using information identifie...

G06F 3/165   Management of the audio str...

G06F 3/167   Audio in a user interface, ...

G06F 40/117   Tagging; Marking up details...

G06F 40/30   Semantic analysis

G10L 13/00   Speech synthesis; Text to s...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

G10L 21/0364   for improving intelligibility

H04R 2227/005   Audio distribution systems ...

H04R 2420/07   Applications of wireless lo...

H04R 2430/01   Aspects of volume control, ...

H04R 3/12   for distributing signals to...

Controlling distributed audio outputs to enable voice output

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Controlling distributed audio outputs to enable voice output

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links