PROCESSING SPOKEN COMMANDS TO CONTROL DISTRIBUTED AUDIO OUTPUTS

US 20170236512A1
Filed: 03/29/2016
Published: 08/17/2017
Est. Priority Date: 02/12/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for controlling a speaker system using an input device, the method comprising:

associating, by at least one device, an input device with a first wireless speaker;

receiving, from the input device by the at least one device, input audio data corresponding to an utterance;

performing, by the at least one device, speech processing on the input audio data to determine speech processing output;

determining, by the at least one device, that the speech processing output identifies a first command to output second audio;

determining, by the at least one device, that the speech processing output identifies a desired output location;

determining, from among a plurality of output devices, a first wireless speaker corresponding to the desired output location;

determining, by the at least one device, that the first wireless speaker is controllable by a network-connected device;

determining, by the at least one device, that the first wireless speaker is outputting first audio;

generating, by the at least one device, a second command instructing the network-connected device to cause a volume level of the first audio to be reduced, the second command executable at least in part by the network-connected device;

sending, by the at least one device, a first instruction to the network-connected device to execute the second command;

determining, by the at least one device, that the speech processing output identifies indicates an audio source from which to generate the second audio;

sending, by the at least one device, output audio data corresponding to the audio source using an address identifier associated with the output audio data;

sending, by the at least one device, the address identifier to the network-connected device; and

sending, by the at least one device, a second instruction to the network-connected device to execute the first command, the first command instructing the network-connected device to obtain the output audio data using the address identifier and to cause the second audio to be generated from the output audio data using the first wireless speaker.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system that is capable of controlling multiple entertainment systems and/or speakers using voice commands. The system receives voice commands and may determine audio sources and speakers indicated by the voice commands. The system may generate audio data from the audio sources and may send the audio data to the speakers using multiple interfaces. For example, the system may send the audio data directly to the speakers using a network address, may send the audio data to the speakers via a voice-enabled device or may send the audio data to the speakers via a speaker controller. The system may generate output zones including multiple speakers and may associate input devices with speakers within the output zones. For example, the system may receive a voice command from an input device in an output zone and may reduce output audio generated by speakers in the output zone.

421 Citations

20 Claims

1. A computer-implemented method for controlling a speaker system using an input device, the method comprising:
- associating, by at least one device, an input device with a first wireless speaker;
  
  receiving, from the input device by the at least one device, input audio data corresponding to an utterance;
  
  performing, by the at least one device, speech processing on the input audio data to determine speech processing output;
  
  determining, by the at least one device, that the speech processing output identifies a first command to output second audio;
  
  determining, by the at least one device, that the speech processing output identifies a desired output location;
  
  determining, from among a plurality of output devices, a first wireless speaker corresponding to the desired output location;
  
  determining, by the at least one device, that the first wireless speaker is controllable by a network-connected device;
  
  determining, by the at least one device, that the first wireless speaker is outputting first audio;
  
  generating, by the at least one device, a second command instructing the network-connected device to cause a volume level of the first audio to be reduced, the second command executable at least in part by the network-connected device;
  
  sending, by the at least one device, a first instruction to the network-connected device to execute the second command;
  
  determining, by the at least one device, that the speech processing output identifies indicates an audio source from which to generate the second audio;
  
  sending, by the at least one device, output audio data corresponding to the audio source using an address identifier associated with the output audio data;
  
  sending, by the at least one device, the address identifier to the network-connected device; and
  
  sending, by the at least one device, a second instruction to the network-connected device to execute the first command, the first command instructing the network-connected device to obtain the output audio data using the address identifier and to cause the second audio to be generated from the output audio data using the first wireless speaker.
- View Dependent Claims (2, 3, 4)
- - 2. The computer-implemented method of claim 1, further comprising:
    - generating second output audio data that includes synthesized speech indicating the first command being performed;
      
      sending the second output audio data using a second address identifier; and
      
      sending the second address identifier to the network-connected device,wherein the first command instructs the network-connected device to;
      
      obtain the output audio data using the address identifier,cause the second audio to be generated, from the output audio data via the first wireless speaker, at a first volume level,obtain the second output audio data using the second address identifier, andcause third audio to be generated, from the second output audio data via the first wireless speaker, at a second volume level.
  - 3. The computer-implemented method of claim 1, further comprising:
    - determining that the speech processing output indicates a third command, the third command having a first format associated with a server device;
      
      sending a request to an application programming interface to translate the third command from the first format to a second format associated with the network-connected device; and
      
      receiving the first command from the application programming interface, the first command having the second format.
  - 4. The computer-implemented method of claim 1, further comprising:
    - receiving audio metadata from the network-connected device, the audio metadata corresponding to second output audio data sent from the network-connected device to the first wireless speaker;
      
      receiving, from the input device, second input audio data corresponding to a second utterance;
      
      determining that the second input audio data corresponds to a request for a song title of the second output audio data;
      
      determining the song title using the audio metadata;
      
      generating third output audio data that includes synthesized speech indicating the song title;
      
      sending the third output audio data to the network-connected device;
      
      generating a third command instructing the network-connected device to cause third audio to be generated from the third output audio data using the first wireless speaker; and
      
      sending a third instruction to the network-connected device to execute the third command.

5. A computer-implemented method comprising:
- receiving, from an input device by at least one device, input audio data corresponding to an utterance;
  
  performing, by the at least one device, speech processing on the input audio data to determine speech processing output;
  
  determining, by the at least one device, that the speech processing output identifies a first command;
  
  determining, by the at least one device, that the speech processing output identifies a desired output location;
  
  determining, from among a plurality of output devices, a first output device corresponding to the desired output location;
  
  determining, by the at least one device, that the first output device is controllable by a network-connected device;
  
  determining, by the at least one device, that the first output device is outputting first audio;
  
  generating, by the at least one device, a second command instructing the network-connected device to cause a volume level of the first audio to be reduced, the second command executable at least in part by the network-connected device;
  
  sending, by the at least one device, a first instruction to the network-connected device to execute the second command; and
  
  sending, by the at least one device, a second instruction to the network-connected device to execute the first command and send command output to the first output device.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
- - 6. The computer-implemented method of claim 5, further comprising:
    - determining that the first command instructs the network-connected device to cause second audio to be generated using the first output device;
      
      determining that the speech processing output indicates an audio source from which to generate the second audio;
      
      sending output audio data corresponding to the audio source using an address identifier associated with the output audio data; and
      
      sending the address identifier to the network-connected device,wherein the first command instructs the network-connected device to obtain the output audio data using the address identifier and to cause the second audio to be generated from the output audio data using the first output device.
  - 7. The computer-implemented method of claim 6, further comprising:
    - generating second output audio data that includes synthesized speech corresponding to the first command;
      
      sending the second output audio data using a second address identifier; and
      
      sending the second address identifier to the network-connected device,wherein the first command instructs the network-connected device to;
      
      obtain the output audio data using the second address identifier,cause the second audio to be generated, from the output audio data via the first output device, at a first volume level,obtain the second output audio data using the second address identifier, andcause third audio to be generated, from the second output audio data via the first output device, at a second volume level.
  - 8. The computer-implemented method of claim 5, further comprising:
    - determining that the speech processing output indicates a third command, the first command having a first format associated with the at least one server device;
      
      sending a request to an application programming interface to translate the third command from the first format to a second format associated with the network-connected device; and
      
      receiving the first command from the application programming interface, the first command having the second format.
  - 9. The computer-implemented method of claim 5, further comprising:
    - determining a first location of the input device;
      
      determining a plurality of output devices that correspond to the first location; and
      
      determining that the first output device is associated with the input device as the first output device is included in the plurality of output devices.
  - 10. The computer-implemented method of claim 5, further comprising:
    - determining that the input audio data includes a wakeword;
      
      generating the second command instructing the network-connected device to cause the volume level of the first audio to be reduced from a first level to a second level;
      
      sending the first instruction to the network-connected device to execute the second command;
      
      determining the first command;
      
      sending the second instruction to the network-connected device to execute the first command;
      
      generating a third command instructing the network-connected device to cause the volume level of the first audio to be increased from the second level to the first level; and
      
      sending a third instruction to the network-connected device to execute the third command.
  - 11. The computer-implemented method of claim 5, further comprising:
    - receiving first identification data from a plurality of input devices, the plurality of input devices including the input device;
      
      receiving a first location of the input device;
      
      receiving second identification data from a plurality of output devices, the plurality of output devices including the first output device, and a second output device;
      
      receiving a second location associated with the first output device;
      
      receiving a third location associated with the second output device;
      
      determining that the second location is in proximity to the third location;
      
      generating a first output zone including the first output device and the second output device;
      
      determining that the first location is in proximity to the second location and the third location; and
      
      associating the input device with the first output zone.
  - 12. The computer-implemented method of claim 5, further comprising:
    - receiving metadata from the network-connected device, the metadata identifying second output audio data sent from the network-connected device to the first output device;
      
      receiving, from the input device, second input audio data corresponding to a second utterance;
      
      determining that the second input audio data corresponds to a request for first information about the second output audio data;
      
      determining the first information using the metadata;
      
      generating third output audio data including synthesized speech corresponding to the first information;
      
      sending the third output audio data to the network-connected device;
      
      generating a third command instructing the network-connected device to cause third audio to be generated from the third output audio data using the first output device; and
      
      sending a third instruction to the network-connected device to execute the third command.

13. A device, comprising:
- at least one processor;
  
  memory including instructions operable to be executed by the at least one processor to configure the device to;
  
  receive, from an input device, input audio data corresponding to an utterance;
  
  perform speech processing on the input audio data to determine speech processing output;
  
  determine that the speech processing output identifies a first command;
  
  determine that the speech processing output identifies a desired output location;
  
  determine, from among a plurality of output devices, a first output device corresponding to the desired output location;
  
  determine that the first output device is controllable by a network-connected device;
  
  determine that the first output device is outputting first audio;
  
  generate a second command instructing the network-connected device to cause a volume level of the first audio to be reduced, the second command executable at least in part by the network-connected device;
  
  send a first instruction to the network-connected device to execute the second command; and
  
  send a second instruction to the network-connected device to execute the first command and send command output to the first output device.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The device of claim 13, wherein the instructions further configure the device to:
    - determine that the first command instructs the network-connected device to cause second audio to be generated using the first output device;
      
      determine that the speech processing output indicates an audio source from which to generate the second audio;
      
      send output audio data corresponding to the audio source using an address identifier associated with the output audio data; and
      
      send the address identifier to the network-connected device,wherein the first command instructs the network-connected device to obtain the output audio data using the address identifier and to cause the second audio to be generated from the output audio data using the first output device.
  - 15. The device of claim 14, wherein the instructions further configure the device to:
    - generate second output audio data that includes synthesized speech corresponding to the first command;
      
      send the second output audio data using a second address identifier; and
      
      send the second address identifier to the network-connected device,wherein the first command instructs the network-connected device to;
      
      obtain the output audio data using the second address identifier,cause the second audio to be generated, from the output audio data via the first output device, at a first volume level,obtain the second output audio data using the second address identifier, andcause third audio to be generated, from the second output audio data via the first output device, at a second volume level.
  - 16. The device of claim 13, wherein the instructions further configure the device to:
    - determine that the speech processing output indicates a third command, the first command having a first format associated with the at least one server device;
      
      send a request to an application programming interface to translate the third command from the first format to a second format associated with the network-connected device; and
      
      receive the second command from the application programming interface, the second command having the second format.
  - 17. The device of claim 13, wherein the instructions further configure the device to:
    - determine a first location of the input device;
      
      determine a plurality of output devices that correspond to the first location; and
      
      determine that the first output device is associated with the input device as the first output device is included in the plurality of output devices.
  - 18. The device of claim 13, wherein the instructions further configure the device to:
    - determine that the input audio data includes a wakeword;
      
      generate the second command instructing the network-connected device to cause the volume level of the first audio to be reduced from a first level to a second level;
      
      send the first instruction to the network-connected device to execute the second command;
      
      determine the first command;
      
      send the second instruction to the network-connected device to execute the first command;
      
      generate a third command instructing the network-connected device to cause the volume level of the first audio to be increased from the second level to the first level; and
      
      send a third instruction to the network-connected device to execute the third command.
  - 19. The device of claim 13, wherein the instructions further configure the device to:
    - receive first identification data from a plurality of input devices, the plurality of input devices including the input device;
      
      receive a first location of the input device;
      
      receive second identification data from a plurality of output devices, the plurality of output devices including the first output device, and a second output device;
      
      receive a second location associated with the first output device;
      
      receive a third location associated with the second output device;
      
      determine that the second location is in proximity to the third location;
      
      generate a first output zone including the first output device and the second output device;
      
      determine that the first location is in proximity to the second location and the third location; and
      
      associate the input device with the first output zone.
  - 20. The device of claim 13, wherein the instructions further configure the device to:
    - receive metadata from the network-connected device, the metadata identifying second output audio data sent from the network-connected device to the first output device;
      
      receive, from the input device, second input audio data corresponding to a second utterance;
      
      determine that the second input audio data corresponds to a request for first information about the second output audio data;
      
      determine the first information using the metadata;
      
      generate third output audio data including synthesized speech corresponding to the first information;
      
      send the third output audio data to the network-connected device;
      
      generate a third command instructing the network-connected device to cause third audio to be generated from the third output audio data using the first output device; and
      
      send a third instruction to the network-connected device to execute the third command.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Williams, Robert, Rabuchin, Steven Todd, Hart, Gregory Michael

Granted Patent

US 9,858,927 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/68   Retrieval characterised by ...

G06F 3/165   Management of the audio str...

G06F 40/40   Processing or translation o...

G10L 13/02   Methods for producing synth...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

G10L 21/0364   for improving intelligibility

H04R 2420/07   Applications of wireless lo...

H04R 2430/01   Aspects of volume control, ...

H04R 3/12   for distributing signals to...

PROCESSING SPOKEN COMMANDS TO CONTROL DISTRIBUTED AUDIO OUTPUTS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

421 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

PROCESSING SPOKEN COMMANDS TO CONTROL DISTRIBUTED AUDIO OUTPUTS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

421 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links