Systems and methods for routing content to an associated output device

US 10,271,093 B1
Filed: 06/27/2016
Issued: 04/23/2019
Est. Priority Date: 06/27/2016
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

with a backend system;

receiving first request audio data representing a first utterance, the first request audio data received from a voice activated electronic device,receiving a customer identifier associated with the voice activated electronic device,determining a user account associated with the customer identifier,generating first text data representing the first request audio data by executing speech-to-text functionality on the first request audio data, anddetermining, using the first text data, that a first intent of the first utterance is for information to be output by a target device;

determining that an output device that is capable of presenting visual data is also associated with the user account;

determining that a visual information response to the first utterance is available;

determining that the target device is the output device such that the visual information response is to be displayed by a display screen of the output device;

determining that a first audio response to the first utterance is to be sent to the voice activated electronic device;

determining that a second audio response to the first utterance is to be sent to the output device;

determining that a video response to the first utterance is also to be sent to the output device;

generating first response text data responsive to the first utterance;

generating first audio data representing the first response text data by executing text-to-speech functionality on the first response text data;

sending the first audio data to the voice activated electronic device, such that the first audio response is played by a first speaker of the voice activated electronic device;

generating second response text data responsive to the first utterance, including receiving at least a portion of the second response text data from an application;

generating second audio data representing the second response text data by executing text-to-speech functionality on the second response text data;

generating video data responsive to the first utterance;

sending the second audio data to the output device, such that the second audio response is played by a second speaker of the output device; and

sending the video data to the output device, such that the video response is played by the display screen of the output device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Devices and methods for routing content are provided herein. In some embodiments, a method for routing content include receiving audio data representing a command from a first electronic device, determining content that is associated with the command, sending responsive audio data to the first electronic device, and sending instructions to the second electronic device to output the content associated with the command. In some embodiments, a method for routing contents includes determining a state of the second electronic device and sending instructions to output the content to a selected one of the first and second electronic devices based on the state of the second electronic device.

47 Citations

View as Search Results

22 Claims

1. A method, comprising:
- with a backend system;
  
  receiving first request audio data representing a first utterance, the first request audio data received from a voice activated electronic device,receiving a customer identifier associated with the voice activated electronic device,determining a user account associated with the customer identifier,generating first text data representing the first request audio data by executing speech-to-text functionality on the first request audio data, anddetermining, using the first text data, that a first intent of the first utterance is for information to be output by a target device;
  
  determining that an output device that is capable of presenting visual data is also associated with the user account;
  
  determining that a visual information response to the first utterance is available;
  
  determining that the target device is the output device such that the visual information response is to be displayed by a display screen of the output device;
  
  determining that a first audio response to the first utterance is to be sent to the voice activated electronic device;
  
  determining that a second audio response to the first utterance is to be sent to the output device;
  
  determining that a video response to the first utterance is also to be sent to the output device;
  
  generating first response text data responsive to the first utterance;
  
  generating first audio data representing the first response text data by executing text-to-speech functionality on the first response text data;
  
  sending the first audio data to the voice activated electronic device, such that the first audio response is played by a first speaker of the voice activated electronic device;
  
  generating second response text data responsive to the first utterance, including receiving at least a portion of the second response text data from an application;
  
  generating second audio data representing the second response text data by executing text-to-speech functionality on the second response text data;
  
  generating video data responsive to the first utterance;
  
  sending the second audio data to the output device, such that the second audio response is played by a second speaker of the output device; and
  
  sending the video data to the output device, such that the video response is played by the display screen of the output device.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, further comprising:
    - receiving, from the voice activated electronic device, second request audio data representing a second utterance;
      
      generating second text data representing the second request audio data by executing speech-to-text functionality on the second request audio data;
      
      determining, using the second text data, a second intent of the second utterance by;
      
      receiving, from a first domain, a first confidence score indicating a first likelihood that the second utterance is a request to play first content on the voice activated electronic device,receiving, from a second domain, a second confidence score indicating a second likelihood that the second utterance is a request to play second content on the output device,determining the first confidence score is greater than a predetermined threshold indicating that a first functionality of the first domain is capable of servicing the second utterance,determining the second confidence score is also greater than the predetermined threshold indicating that a second functionality of the second domain is also capable of servicing the second utterance,determining that a selection of the first functionality or the second functionality to be used for responding to the second utterance is needed to determine the second intent,generating query text data representing an intent question asking whether the second utterance should be responded to by the first domain or the second domain,generating query audio data representing the query text data by executing text-to-speech functionality on the query text data,generating an instruction for the voice activated electronic device to continue to send additional audio data representing local audio captured by the voice activated electronic device after audio corresponding to the query audio data is played,sending, to the voice activated electronic device, the query audio data such that the intent question is played by the first speaker,sending the instruction to the voice activated electronic device,receiving, from the voice activated electronic device, the additional audio data,generating third text data representing the additional audio data by executing speech-to-text functionality on the additional audio data, anddetermining, using the third text data, that the local audio included a third utterance of an intent response having a third intent is to play the second content on the output device;
      
      determining that the user account is capable of accessing the second content;
      
      generating a uniform resource locator (URL) that allows the output device to output the second content; and
      
      sending the URL to the output device, such that the output device outputs the second content.
  - 3. The method of claim 1, further comprising:
    - receiving, from the voice activated electronic device, second request audio data representing a second utterance;
      
      generating second text data representing the second request audio data by executing speech-to-text functionality on the second request audio data;
      
      determining that a second intent of the second text data is a request to play a first song on the voice activated electronic device;
      
      generating a first uniform resource locator (URL) that allows the voice activated electronic device to output the first song;
      
      sending the first URL to the voice activated electronic device, such that the song is played using the first speaker;
      
      receiving, from the voice activated electronic device, third request audio data representing a third utterance;
      
      generating third text data representing the third request audio data by executing speech-to-text functionality on the third request audio data;
      
      determining that a third intent of the third text data is another request to play the first song on the output device;
      
      generating an instruction for the voice activated electronic device to stop playing the first song;
      
      sending, to the voice activated device, the instruction such that the first song stops playing on the voice activated electronic device;
      
      generating a second URL that allows the output device to output the first song;
      
      generating song video data for the output device;
      
      sending, to the output device, the second URL such that the first song is played by the second speaker;
      
      sending, to the output device, the song video data, such that the song video data is played by the display screen while the first song is played by the second speaker;
      
      generating fourth text data representing a confirmation message;
      
      generating fifth audio data representing the fourth text data by executing text-to-speech functionality on the fourth text data; and
      
      sending, to the voice activated electronic device, the fifth audio data, such that the confirmation message is played by the first speaker.

4. A method performed by at least one backend system, comprising:
- receiving, from a first electronic device, first audio data representing a first utterance;
  
  determining that a first user account is associated with the first electronic device;
  
  generating first text data representing the first audio data;
  
  determining, using the first text data, a first intent of the first utterance;
  
  determining that a second electronic device is also associated with the user account;
  
  determining that a first response to the first utterance is capable of being sent to the second electronic device;
  
  generating second text data representing a second response to the first utterance;
  
  generating second audio data representing the second text data;
  
  sending the second audio data to the first electronic device, such that the second response is output by a speaker associated with the first electronic device;
  
  generating first image data representing the first response; and
  
  sending the first image data to the second electronic device such that the first response is output on a display screen associated with the second electronic device.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 22)
- - 5. The method of claim 4, wherein determining that the first user account is associated with the first electronic device further comprises:
    - receiving a customer identifier associated with the first electronic device; and
      
      determining that the customer identifier is associated with the first user account.
  - 6. The method of claim 4, wherein:
    - determining the first intent further comprises determining the first intent is for content to be played on a target device; and
      
      determining that the second electronic device is also associated with the user account further comprises determining the target device is the second electronic device.
  - 7. The method of claim 4, further comprising:
    - receiving, from the first electronic device, third audio data representing a second utterance;
      
      generating third text data representing the third audio data;
      
      determining a second intent of the third text data by;
      
      receiving a confidence score exceeding a predetermined threshold,generating fourth text data representing a query message,generating fourth audio data representing the fourth text data,generating first instructions for the first electronic device,sending, to the first electronic device, the fourth audio data such that the query message is output by the speaker associated with the first electronic device,sending, to the first electronic device, the first instructions such that the first electronic device sends fifth audio data representing a response to the query message,receiving, from the first electronic device, the fifth audio data,generating fifth text data representing the fifth audio data, anddetermining, using the fifth text data, a third intent of the response to the query message;
      
      receiving first content responsive to the second utterance; and
      
      sending the first content to the second electronic device such that the first content is output by the second electronic device.
  - 8. The method of claim 7, further comprising:
    - generating sixth text data;
      
      generating sixth audio data representing the sixth text data; and
      
      based at least in part on determining the third intent, sending the sixth audio data to the first electronic device such that audio corresponding to the sixth audio data is output by the speaker associated with the first electronic device.
  - 9. The method of claim 7, wherein determining the first intent further comprises:
    - determining, based on the first text data, that at least two domains are capable of responding to the first utterance.
  - 10. The method of claim 9, wherein the at least two domains comprise:
    - a first domain indicating the first utterance is a request to play a song having a title on the first electronic device; and
      
      a second domain indicating the first utterance is a request to play a movie having the title on the second electronic device.
  - 11. The method of claim 4, further comprising:
    - receiving, from the first electronic device, third audio data representing a second utterance;
      
      generating third text data representing the third audio data;
      
      determining a second intent of the third text data;
      
      receiving first content responsive to the second utterance;
      
      sending the first content to the first electronic device such that the first content is output by the first electronic device;
      
      receiving, from the first electronic device, fourth audio data representing a third utterance;
      
      generating fourth text data representing the fourth audio data;
      
      determining a third intent of the fourth text data is to play second content on the second electronic device;
      
      determining the second content and the first content are the same;
      
      generating first instructions for the first electronic device;
      
      sending the first instructions to the first electronic device such that the first content is no longer output by the first electronic device;
      
      receiving the second content; and
      
      sending the second content to the second electronic device such that the second content is output by the second electronic device.
  - 12. The method of claim 11, further comprising:
    - generating second image data; and
      
      sending the second image data to the second electronic device.
  - 22. The method of claim 4, further comprising:
    - generating third text data representing a third response to the first utterance;
      
      generating third audio data representing the third text data; and
      
      sending the third audio data to the second electronic device, such that the third response is output by the speaker associated with the second electronic device.

13. At least one backend system, comprising:
- at least one processor; and
  
  at least one computer-readable medium encoded with instructions which, when executed by the at least one processor, cause the backend system to;
  
  receive first audio data representing a first utterance from a first electronic device,determine that the first electronic device is associated with a first user account,generate first text data representing the first audio data,determine, using the first text data, a first intent of the first utterance,determine that a second electronic device is associated with the user account,determine that a first response to the first utterance is capable of being sent to the second electronic device,generate second text data representing a second response to the first utterance,generate second audio data representing the second text data,send the second audio data to the first electronic device, such that the second response is output by a speaker associated with the first electronic device,generate first image data representing the first response, andsend the first image data to the second electronic device such that the first response is output on a display screen associated with the second electronic device.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
- - 14. The at least one backend system of claim 13, wherein the at least one computer-readable medium is encoded with additional instructions which, when executed by the at least one processor, further cause the at least one backend system to:
    - receive a customer identifier from the first electronic device; and
      
      determine that the customer identifier is associated with the first user account.
  - 15. The at least one backend system of claim 13, wherein the at least one computer-readable medium is encoded with additional instructions which, when executed by the at least one processor, further cause the at least one backend system to:
    - determine the first intent by determining that the first intent is for content to be played on a target device; and
      
      determine that the second electronic device is associated with the user account by determining the target device is the second electronic device.
  - 16. The at least one backend system of claim 13, wherein the at least one computer-readable medium is encoded with additional instructions which, when executed by the at least one processor, further cause the at least one backend system to:
    - receive third audio data representing a second utterance;
      
      generate third text data representing the third audio data;
      
      receive a confidence score exceeding a predetermined threshold;
      
      generate fourth text data representing a query message;
      
      generate fourth audio data representing the fourth text data;
      
      generate listening instructions for the first electronic device;
      
      send the fourth audio data to the first electronic device, such that the query message is output by the speaker associated with the first electronic device;
      
      send the listening instructions to the first electronic device such that the first electronic devices sends fifth audio data representing a response to the query message;
      
      determine that the fifth audio data has been received from the first electronic device;
      
      generate fifth text data representing the fifth audio data;
      
      determine a second intent of the fifth text data;
      
      receive first content responsive to the second utterance; and
      
      send the first content to the second electronic device such that the first content is output by the second electronic device.
  - 17. The at least one backend system of claim 16, wherein the at least one computer-readable medium is encoded with additional instructions which, when executed by the at least one processor, further cause the at least one backend system to:
    - receive sixth text data;
      
      generate sixth audio data representing the sixth text data; and
      
      based at least in part on determining the second intent, send the sixth audio data to the first electronic device such that audio corresponding to the sixth audio data is output by the speaker associated with the first electronic device.
  - 18. The at least one backend system of claim 16, wherein the at least one computer-readable medium is encoded with additional instructions which, when executed by the at least one processor, further cause the at least one backend system to:
    - determine the first intent by determining, based on the first text data, that at least two domains are capable of responding to the first utterance.
  - 19. The at least one backend system of claim 13, wherein the at least one computer-readable medium is encoded with additional instructions which, when executed by the at least one processor, further cause the at least one backend system to:
    - receive, from the first electronic device, third audio data representing a second utterance;
      
      generate third text data representing the third audio data;
      
      determine a second intent of the third text data;
      
      receive first content responsive to the second utterance;
      
      send the first content to the first electronic device such that the first content is output by the first electronic device;
      
      determine that fourth audio data representing a third utterance has been received from the first electronic device;
      
      generate fourth text data representing the fourth audio data;
      
      determine that a third intent of the fourth text data is to output the first content on the second electronic device;
      
      generate first instructions for the first electronic device;
      
      send first instructions to the first electronic device such that the first content is no longer output by the first electronic device;
      
      receive second content such that the second content is the same as the first content; and
      
      send the second content to the second electronic device such that the second content is output by the second electronic device.
  - 20. The at least one backend system of claim 19, wherein the at least one computer-readable medium is encoded with additional instructions which, when executed by the at least one processor, further cause the at least one backend system to:
    - generate second image data; and
      
      based at least in part on determining the third intent, send the second image data to the second electronic device.
  - 21. The at least one backend system of claim 13, wherein the at least one computer-readable medium is encoded with additional instructions which, when executed by the at least one processor, cause the at least one backend system to:
    - generate third text data representing a third response to the first utterance;
      
      generate third audio data representing the third text data; and
      
      send the third audio data to the second electronic device, such that the third response is output by the speaker associated with the second electronic device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Jobanputra, Soniya, Typrin, Marcello, Trudell, Mallory
Primary Examiner(s)
Alam, Mushfikh I

Application Number

US15/194,064
Time in Patent Office

1,030 Days
Field of Search

725 74-104
US Class Current
CPC Class Codes

G06F 16/3329   Natural language query form...

G06F 3/165   Management of the audio str...

G06F 3/167   Audio in a user interface, ...

G06F 40/35   Discourse or dialogue repre...

G10L 15/26   Speech to text systems G10L...

G10L 25/51   for comparison or discrimin...

H04N 21/25875   involving end-user authenti...

H04N 21/439   Processing of audio element...

H04N 21/4415   using biometric characteris...

H04N 21/4516   involving client characteri...

H04N 21/4518   involving characteristics o...

H04N 21/4751   for defining user accounts,...

H04N 21/835   Generation of protective da...

H04N 21/8586   by using a URL processing c...

Systems and methods for routing content to an associated output device

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

47 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for routing content to an associated output device

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

47 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links