Latency reduction for content playback

US 9,990,176 B1
Filed: 06/28/2016
Issued: 06/05/2018
Est. Priority Date: 06/28/2016
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

receiving from a first user device, at an electronic device, first audio data representing a first utterance;

determining a first customer identifier associated with the first user device;

determining, using the first customer identifier, a user account on the electronic device, wherein the user account is associated with the first user device;

generating first text data representing the first audio data by executing speech-to-text functionality on the first audio data;

determining, using the first text data, that a first intent of the first utterance is for a song to be played;

determining a download history for the user account, the download history indicating content that has been downloaded from the electronic device by one or more devices associated with the user account;

determining, based on the download history, that first song audio data representing the song was previously downloaded to the first user device from the electronic device;

determining a first user device profile associated with the user account, the first user device profile being associated with the first user device and indicating content items that are currently stored by the first user device;

determining, from the first user device profile, that the first song audio data is stored in memory by the first user device;

generating a first instruction to cause the first user device to play the first song audio data;

sending the first instruction to the first user device;

receiving, at the electronic device, second audio data representing a second utterance that requests additional music to be played, the second audio data being received from the first user device;

generating second text data representing the second audio data by executing the speech-to-text functionality on the second audio data;

determining, using the second text data, that a second intent of the second utterance is for a new song to be played;

determining, based on the download history, that second song audio data representing the new song is not stored within the memory;

determining, based on the download history, that a second user device associated with the user account had previously downloaded the second song audio data;

determining that the first user device and the second user device are capable of communicating directly with each other using a direct communications link;

generating a second instruction that causes the first user device to request that the second user device send the second song audio data to the first user device using the direct communications link; and

sending the second instruction to the first user device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and devices for determining whether a local version of content is stored on an electronic device associated with a user account on a backend system are described herein. In a non-limiting embodiment, the backend system may track and monitor the content stored on the electronic device using the associated user account. If an individual speaks an utterance requesting a particular content item, the backend system may determine, prior to sending the content to the electronic device, whether a local version is stored within the electronic device'"'"'s memory. If so, the backend system may instruct the electronic device to output the local version, thereby reducing the amount of bandwidth consumed. The backend system may further be capable of predictively generating and then caching certain audio data to the electronic device. For instance, frequent utterances may be tracked, and likely responses to those utterances may be generated prior to the utterance being spoken so that the response is available substantially instantaneously.

181 Citations

17 Claims

1. A method, comprising:
- receiving from a first user device, at an electronic device, first audio data representing a first utterance;
  
  determining a first customer identifier associated with the first user device;
  
  determining, using the first customer identifier, a user account on the electronic device, wherein the user account is associated with the first user device;
  
  generating first text data representing the first audio data by executing speech-to-text functionality on the first audio data;
  
  determining, using the first text data, that a first intent of the first utterance is for a song to be played;
  
  determining a download history for the user account, the download history indicating content that has been downloaded from the electronic device by one or more devices associated with the user account;
  
  determining, based on the download history, that first song audio data representing the song was previously downloaded to the first user device from the electronic device;
  
  determining a first user device profile associated with the user account, the first user device profile being associated with the first user device and indicating content items that are currently stored by the first user device;
  
  determining, from the first user device profile, that the first song audio data is stored in memory by the first user device;
  
  generating a first instruction to cause the first user device to play the first song audio data;
  
  sending the first instruction to the first user device;
  
  receiving, at the electronic device, second audio data representing a second utterance that requests additional music to be played, the second audio data being received from the first user device;
  
  generating second text data representing the second audio data by executing the speech-to-text functionality on the second audio data;
  
  determining, using the second text data, that a second intent of the second utterance is for a new song to be played;
  
  determining, based on the download history, that second song audio data representing the new song is not stored within the memory;
  
  determining, based on the download history, that a second user device associated with the user account had previously downloaded the second song audio data;
  
  determining that the first user device and the second user device are capable of communicating directly with each other using a direct communications link;
  
  generating a second instruction that causes the first user device to request that the second user device send the second song audio data to the first user device using the direct communications link; and
  
  sending the second instruction to the first user device.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, further comprising:
    - generating, in response to determining that the first song audio data is stored in the memory, third text data representing a first audio message to introduce the song to be played;
      
      generating third audio data representing the third text data by executing text-to-speech functionality on the third text data; and
      
      sending the third audio data to the first user device such that the first audio message is played prior to the first song audio data being played.
  - 3. The method of claim 1, further comprising:
    - determining a number of instances with which a third utterance is received from the first user device;
      
      determining that the number is greater than a frequent utterance threshold value indicating that the third utterance is a frequent utterance;
      
      determining a response for the third utterance prior to receiving an additional instance of the third utterance from the first user device;
      
      generating third text data representing the response;
      
      generating third audio data representing the third text data by executing text-to-speech functionality on the third text data;
      
      sending the third audio data to the first user device such that the third audio data is stored within the memory;
      
      receiving, at the electronic device, fourth audio data representing a fourth utterance;
      
      generating fourth text data representing the fourth audio data by executing the speech-to-text functionality on the fourth audio data;
      
      determining, using fourth text data, that the fourth utterance is the frequent utterance;
      
      determining, based on the first user device profile, that the first user device includes the third audio data stored within the memory;
      
      generating a third instruction to cause the first user device to play the third audio data; and
      
      sending the third instruction to the first user device.

4. A method, comprising:
- receiving, from a first device, first audio data representing a first utterance;
  
  determining a user account associated with the first device;
  
  determining, based on first text data representing the first audio data, that a first intent of the first utterance is for first content to be output;
  
  determining, for the user account, content information associated with at least the first device;
  
  determining, based on the content information, that a first local version of the first content is stored on the first device;
  
  generating a first instruction for the first local version to be output by the first device;
  
  sending the first instruction to the first device;
  
  receiving, from the first device, second audio data representing a second utterance;
  
  determining, based on second text data representing the second audio data, that a second intent of the second utterance is for second content to be output;
  
  determining that a second device is also associated with the user account;
  
  determining, based on the content information, that a second local version of the second content is stored on the second device; and
  
  determining that the second device and the first device are capable of communicating using at least one short-range communications protocol.
- View Dependent Claims (5, 6, 7, 8, 9, 10)
- - 5. The method of claim 4, further comprising:
    - generating, prior to generating the first instruction, third text data representing a first response;
      
      generating third audio data representing the third text data; and
      
      sending the third audio data to the first device such that the first response outputs prior to the first local version.
  - 6. The method of claim 4, further comprising:
    - generating a second instruction that causes the second device to send the second local version to the first device using the at least one short-range communications protocol; and
      
      sending the second instruction to the first device.
  - 7. The method of claim 4, further comprising:
    - determining, prior to generating the first instruction, a first file size of the first content;
      
      determining that the first file size is greater than a predefined file size threshold; and
      
      determining that, for the user account, the first local version is to be output prior to sending a link to the first content to the first device based on the first file size being greater than the predefined file size threshold.
  - 8. The method of claim 4, further comprising:
    - determining frequent utterances associated with the user account;
      
      generating, prior to receiving third audio data representing one of the frequent utterances, third text data representing at least one response to the frequent utterances;
      
      generating third audio data representing the third text data; and
      
      sending the third audio data to the first device such that the at least one response is available to be output by the first device.
  - 9. The method of claim 4, further comprising:
    - receiving, from the first device, third audio data representing a third utterance;
      
      determining, based on third text data representing the third audio data, that a third intent of the third utterance is for third content to be output by the first device;
      
      determining, from the content information, that the first device does not include a third local version of the third content;
      
      determining that the second device is incapable of communicating with the first device using the at least one short range communications protocol;
      
      generating a link for the third content stored with a remote device; and
      
      sending the link to the first device such that the third content is output.
  - 10. The method of claim 4, further comprising:
    - receiving, from the first device, third audio data representing a third utterance;
      
      determining that a response is to be output, the response having a first temporal duration;
      
      determining, from the content information, that fourth audio data of the response is stored on the first device;
      
      generating a second instruction that causes the fourth audio data to be output by the first device; and
      
      sending the second instruction to the first device such that the response is output while a third intent of the third utterance is being determined.

11. An electronic device, comprising:
- communications circuitry operable to communicate with at least a first device;
  
  memory; and
  
  at least one processor operable to;
  
  receive, from a first device, first audio data representing a first utterance;
  
  determine a user account associated with the first device;
  
  determine, based on first text data representing the first audio data, that a first intent of the first utterance is for first content to be output;
  
  determine that a first local version of the first content is stored on the first device;
  
  generate second text data representing a first response;
  
  generate second audio data representing the second text data;
  
  generate a first instruction for the first local version to be output by the first device;
  
  send, using the communications circuitry, the first instruction and the second audio data to the first device such that the first local version is output after the second audio data;
  
  receive, from the first device, second audio data representing a second utterance;
  
  generate second text data from the second audio data by applying speech-to-text processing to the second audio data;
  
  determine, based on the second text data, that a second intent of the second utterance is for second content to be output by the first device;
  
  determine, from content information associated with at least the first device, that the first device does not include a second local version of the second content;
  
  determine that there are no additional devices associated with the user account that are capable to send content to the first device using a short-range communications protocol;
  
  generate a link between the first device and a remote device storing a third local version of the second content; and
  
  send, using the communications circuitry, the link to the remote device such that the second content is output to the first device.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The electronic device of claim 11, wherein the at least one processor is further operable to:
    - determine, using the content information, that the first local version is stored on the first device.
  - 13. The electronic device of claim 11, wherein the at least one processor is further operable to:
    - receive, from the first device, third audio data representing a third utterance;
      
      determine, based on third text data representing the third audio data, that a third intent of the third utterance is for third content to be output;
      
      determine that a second device is also associated with the user account;
      
      determine, based on the content information, that a third local version of the third content is stored on the second device; and
      
      determine, based on a first separation distance between the first device and the second device being less than a separation distance threshold, that the second device and the first device are capable of communicating using at least one short-range communications protocol.
  - 14. The electronic device of claim 13, wherein the at least one processor is further operable to:
    - generate a second instruction that causes the second device to send the third local version to the first device using the at least one short-range communications protocol; and
      
      send, using the communications circuitry, the second instruction to the first device.
  - 15. The electronic device of claim 11, wherein the at least one processor is further operable to:
    - determine, prior to generating the first instruction, a first file size of the first content;
      
      determine that the first file size is greater than a predefined file size threshold; and
      
      determine that, for the user account, the first local version is to be output prior to sending a link to the first content to the first device based on the first file size being greater than the predefined file size threshold.
  - 16. The electronic device of claim 11, wherein the at least one processor is further operable to:
    - determine frequent utterances associated with the user account;
      
      generate, prior to receiving further audio data representing one of the frequent utterances, third text data representing at least one second response to the frequent utterances;
      
      generate third audio data representing the third text data; and
      
      send, using the communications circuitry, the third audio data to the first device such that the at least one second response is available to be output by the first device.
  - 17. The electronic device of claim 11, wherein the at least one processor is further operable to:
    - receive third audio data representing a third utterance from the first device;
      
      determine that a second response is to be output, the second response having a first temporal duration;
      
      determine, from the content information, that fourth audio data of the second response is stored on the first device;
      
      generate a second instruction that causes the fourth audio data to be output by the first device; and
      
      send, using the communications circuitry, the second instruction to the first device such that the second response is output while a third intent of the third utterance is being determined.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Gray, Timothy Thomas
Primary Examiner(s)
Baker, Matthew

Application Number

US15/195,464
Time in Patent Office

707 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/632   Query formulation

G06F 16/957   Browsing optimisation, e.g....

G06F 3/165   Management of the audio str...

G06F 3/167   Audio in a user interface, ...

G10L 13/04   Details of speech synthesis...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

Latency reduction for content playback

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

181 Citations

17 Claims

Specification

Use Cases

Quick Links

Others

Latency reduction for content playback

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

181 Citations

17 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others