Accessory for a voice-controlled device

US 10,366,692 B1
Filed: 05/15/2017
Issued: 07/30/2019
Est. Priority Date: 05/15/2017
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving first audio data generated by a microphone of a device, the device residing in an environment that includes an accessory device, the first audio data representing speech of a user in the environment;

receiving an identifier associated with the device;

performing automatic speech recognition (ASR) on the first audio data to generate text representing the speech of the user;

analyzing the text to identify an intent associated with the text;

determining a uniform resource locator (URL) for acquiring primary content based at least in part on the intent associated with the text;

generating second audio data for output on a speaker of the device based at least in part on the intent associated with the text, the second audio data introducing the primary content;

sending the URL to the device;

sending the second audio data to the device for output on the speaker of the device;

determining, based at least in part on the identifier associated with the device, that the accessory device resides in the environment with the device;

identifying supplemental content to output on the accessory device, the supplemental content being associated with the primary content;

generating third audio data for output on the speaker of the device, the third audio data having a frequency of at least 20 kHz such that the third audio data is inaudible to the user, the third audio data encoding information identifying the supplemental content; and

sending the third audio data to the device for output on the speaker of the device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This disclosure describes techniques and systems for encoding instructions in audio data that, when output on a speaker of a first device in an environment, cause a second device to output content in the environment. In some instances, the audio data has a frequency that is inaudible to users in the environment. Thus, the first device is able to cause the second device to output the content without users in the environment hearing the instructions. In some instances, the first device also outputs content, and the content output by the second device is played at an offset relative to a position of the content output by the first device.

122 Citations

View as Search Results

20 Claims

1. A method comprising:
- receiving first audio data generated by a microphone of a device, the device residing in an environment that includes an accessory device, the first audio data representing speech of a user in the environment;
  
  receiving an identifier associated with the device;
  
  performing automatic speech recognition (ASR) on the first audio data to generate text representing the speech of the user;
  
  analyzing the text to identify an intent associated with the text;
  
  determining a uniform resource locator (URL) for acquiring primary content based at least in part on the intent associated with the text;
  
  generating second audio data for output on a speaker of the device based at least in part on the intent associated with the text, the second audio data introducing the primary content;
  
  sending the URL to the device;
  
  sending the second audio data to the device for output on the speaker of the device;
  
  determining, based at least in part on the identifier associated with the device, that the accessory device resides in the environment with the device;
  
  identifying supplemental content to output on the accessory device, the supplemental content being associated with the primary content;
  
  generating third audio data for output on the speaker of the device, the third audio data having a frequency of at least 20 kHz such that the third audio data is inaudible to the user, the third audio data encoding information identifying the supplemental content; and
  
  sending the third audio data to the device for output on the speaker of the device.
- View Dependent Claims (2, 3, 20)
- - 2. The method as recited in claim 1, further comprising:
    - publishing, to an event bus, information associated with at least one of the first audio data or second audio data;
      
      publishing the identifier associated with the device to the event bus; and
      
      identifying the information and the device identifier associated with the device; and
      
      wherein the identifying the supplemental content comprises identifying the supplemental content to output on the accessory device based at least in part on the identifying the information and the identifier, the supplemental content being associated with the primary content via metadata of the primary content.
  - 3. The method as recited in claim 1, wherein the identifying the supplemental content comprises identifying at least one of audible content to output on the accessory device, one or more images to output on the accessory device, one or more lights of the accessory device to turn on, or one or more lights of the accessory device to turn off.
  - 20. The method of claim 1, wherein the third audio data further comprises an offset time at which the accessory device is to output the supplemental content relative to the primary content.

4. A method comprising:
- receiving first audio data from a first device, the first device residing in an environment that includes the first device and a second device, the first audio data representing speech of a user in the environment;
  
  determining, based at least in part on the first audio data, to instruct the second device to output content in the environment;
  
  generating second audio data representing first audio and second audio, wherein;
  
  the first audio has a frequency below 20 kHz; and
  
  the second audio has a frequency of at least 20 kHz and represents one or more instructions for causing the second device to output the content in the environment; and
  
  sending the second audio data to the first device for output on one or more speakers of the first device.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12)
- - 5. The method as recited in claim 4, further comprising:
    - receiving an identifier associated with the first device; and
      
      determining, using the identifier, that the second device is in the environment with the first device; and
      
      wherein the determining to instruct the second device to output the content comprises determining to instruct the second device to output the content based at least in part on the first audio data and on the determining that the second device is in the environment with the first device.
  - 6. The method as recited in claim 4, further comprising:
    - determining a device type of the second device; and
      
      determining, based at least in part on the device type of the second device, to send the second audio data to the first device.
  - 7. The method as recited in claim 4, wherein the content comprises first content, and the one or more instructions comprises one or more first instructions, the method further comprising:
    - determining, based at least in part on the first audio data, to instruct a third device in the environment to output second content;
      
      determining a device type of the third device; and
      
      sending, based at least in part on the device type of the third device, one or more second instructions to the third device over a network, the one or more second instructions for causing the third device to output the second content.
  - 8. The method as recited in claim 4, wherein:
    - the generating the second audio data comprises generating the second audio data that encodes the one or more instructions for causing the second device to output the content in the environment, the one or more instructions specifying a network location for acquiring the content;
      
      receiving, from the second device, a request for the content, the request specifying the network location; and
      
      sending the content to the second device.
  - 9. The method as recited in claim 4, wherein the generating the second audio data comprises generating the second audio data that encodes the one or more instructions for causing the second device to output the content in the environment, the one or more instructions specifying a routine on the second device to execute, the routine stored locally on the second device.
  - 10. The method as recited in claim 4, further comprising:
    - performing automatic speech recognition (ASR) on the first audio data to generate text representing the speech of the user;
      
      analyzing the text to identify an intent associated with the text;
      
      generating, based at least in part on the intent associated with the text, third audio data or information for acquiring the third audio data for output on one or more speakers of the first device; and
      
      sending the third audio data or the information for acquiring the third audio data to the first device for output on the one or more speakers of the first device, andwherein the content for output on the second device is supplemental to the third audio data for output on the one or more speakers of the first device.
  - 11. The method as recited in claim 10, wherein the sending the second audio data and the sending the third audio data comprises sending the second audio data to the first device such that at least a portion of the second audio is output in the environment at a same time as at least a portion of the third audio data.
  - 12. The method as recited in claim 4, further comprising:
    - determining a device type of the second device;
      
      determining information associated with the speech of the user; and
      
      identifying the content for output by the second device based at least in part on the device type of the second device and on the information associated with the speech of the user.

13. A device comprising:
- one or more microphones;
  
  one or more speakers;
  
  one or more processors; and
  
  one or more computer-readable media storing computer-executable instructions that, when executed, cause the one or more processors to perform acts comprising;
  
  generating first audio data based at least in part on speech of a user in an environment, the speech captured by the one or more microphones;
  
  sending the first audio data to one or more remote computing devices;
  
  receiving, from the one or more remote computing devices, second audio data for output by the one or more speakers, the second audio data including;
  
  first audio representing audio that is below 20 kHz, andsecond audio representing audio that is above 20 kHz, the second audio comprising one or more instructions for instructing an accessory device in the environment to acquire supplemental content; and
  
  outputting the second audio data by the one or more speakers, the second audio data including the first audio for the user to hear and the second audio for instructing the accessory device.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The device as recited in claim 13, wherein the outputting the second audio data comprises outputting the second audio data such that at least part of the second audio is output synchronously with at least part of the first audio.
  - 15. The device as recited in claim 13, wherein the outputting the second audio data comprises outputting the first audio prior to or after the second audio of the second audio data.
  - 16. The device as recited in claim 13, wherein the outputting the second audio data comprises outputting the second audio data to cause the accessory device to acquire content for output in the environment by requesting the content from a specified network location.
  - 17. The device as recited in claim 13, wherein the outputting the second audio data comprises outputting the second audio data to cause the accessory device to acquire supplement content that is stored locally on the accessory device.
  - 18. The device as recited in claim 13, wherein the receiving the second audio data comprises receiving a signal audio file that includes the first audio and the second audio.
  - 19. The device as recited in claim 13, wherein speech comprises first speech and the one or more instructions comprising one or more first instructions, the acts further comprising:
    - generating third audio data based at least in part on second speech of the user in the environment, the second speech captured by the one or more microphones;
      
      sending the third audio data to the one or more remote computing devices;
      
      receiving, from the one or more remote computing devices, fourth audio data for output by the one or more speakers;
      
      receiving, from the one or more remote computing devices, fifth data for sending to the accessory device, the data comprising one or more second instructions for the accessory device;
      
      outputting the fourth audio data on the one or more speakers; and
      
      sending the fifth data to the accessory device of a wireless personal area network (WPAN).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Adams, Zoe, Klein, Pete, Deller, Derick, Guarniere, Michael John, Chen, Alina, Naik, Apoorv, Johnson, Jeremy Daniel, Appleman, Aslan
Primary Examiner(s)
Riley, Marcus T

Application Number

US15/595,658
Time in Patent Office

806 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 16/3329   Natural language query form...

G06F 16/632   Query formulation

G06F 3/167   Audio in a user interface, ...

G10L 15/02   Feature extraction for spee...

G10L 15/07   to the speaker

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/16   using artificial neural net...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 17/22   Interactive procedures; Man...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

G10L 2015/226   using non-speech characteri...

G10L 25/78   Detection of presence or ab...

H04M 2201/405   involving speaker-dependent...

H04M 3/5166   in combination with interac...

Accessory for a voice-controlled device

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

122 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Accessory for a voice-controlled device

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

122 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links