Navigating content utilizing speech-based user-selectable elements

US 9,280,973 B1
Filed: 06/25/2012
Issued: 03/08/2016
Est. Priority Date: 06/25/2012
Status: Active Grant

First Claim

Patent Images

1. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising:

receiving a designation of content having user-selectable elements;

analyzing the content to identify an audio command corresponding to one or more user-selectable elements of the user-selectable elements, the audio command being identified based at least in part on an acoustic differentiation between the audio command and a different audio command meeting or exceeding a threshold;

receiving a signal associated with an utterance of a user, the signal generated by a microphone associated with a device;

analyzing the signal associated with the utterance to determine the audio command; and

responding to the utterance in accordance with a user-selectable element of the one or more user-selectable elements corresponding to the audio command, the responding includingcausing information associated with the audio command to be visually output via a projector associated with the device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a content browsing environment, a system analyzes content to identify audio commands to be made available to users. The audio commands may be chosen so that they are easily differentiable from each other when using machine-based speech recognition techniques. When the content is displayed, the system monitors a user'"'"'s speech to detect user utterances corresponding to the audio commands and performs content navigation in response to the user utterances.

Citations

29 Claims

1. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising:
- receiving a designation of content having user-selectable elements;
  
  analyzing the content to identify an audio command corresponding to one or more user-selectable elements of the user-selectable elements, the audio command being identified based at least in part on an acoustic differentiation between the audio command and a different audio command meeting or exceeding a threshold;
  
  receiving a signal associated with an utterance of a user, the signal generated by a microphone associated with a device;
  
  analyzing the signal associated with the utterance to determine the audio command; and
  
  responding to the utterance in accordance with a user-selectable element of the one or more user-selectable elements corresponding to the audio command, the responding includingcausing information associated with the audio command to be visually output via a projector associated with the device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 27, 28, 29)
- - 2. The one or more non-transitory computer-readable media of claim 1, wherein analyzing the content to identify the audio command comprises identifying words of the content based at least in part on the acoustic differentiation between the audio command and the different audio command by a machine-implemented speech recognizer.
  - 3. The one or more non-transitory computer-readable media of claim 1, wherein analyzing the content to identify the audio command comprises selecting text from the content.
  - 4. The one or more non-transitory computer-readable media of claim 1, wherein analyzing the content to identify the audio command comprises selecting text associated with the one or more user-selectable elements.
  - 5. The one or more non-transitory computer-readable media of claim 1, wherein at least one user-selectable element of the user-selectable elements is associated with an image, and wherein analyzing the content to identify the audio command comprises performing image recognition to identify one or more words that correspond to the image.
  - 6. The one or more non-transitory computer-readable media of claim 1, wherein at least one user-selectable element of the user-selectable elements is associated with audio, and wherein analyzing the content to identify the audio command comprises performing speech recognition to identify one or more words that correspond to the audio.
  - 7. The one or more non-transitory computer-readable media of claim 1, the acts further comprising displaying textual indications of the audio command in conjunction with the content.
  - 8. The one or more non-transitory computer-readable media of claim 1, wherein an individual user-selectable element of the user-selectable elements is associated with non-textual media, and wherein analyzing the content to identify the audio command comprises recognizing one or more words indicated by the non-textual media.
  - 27. The one or more non-transitory computer-readable media of claim 1, wherein analyzing the content to identify the audio command comprises excluding audio commands having a different acoustic differentiation that is less than the threshold.
  - 28. The one or more non-transitory computer-readable media of claim 1, wherein the acts further comprise indicating an availability of the audio command by causing at least one of a textual indication or a graphical indication to be overlaid on the user-selectable element.
  - 29. The one or more non-transitory computer-readable media of claim 1, wherein the acts further comprise indicating an availability of the audio command by causing the information to be emphasized via a display associated with the device.

9. A method, comprising:
- receiving, by one or more computing devices, a request designating content, wherein the content has a user-selectable element;
  
  determining, by at least one computing device of the one or more computing devices, an audio command corresponding to the user-selectable element, the audio command being determined based at least in part on an acoustic differentiation between the audio command and a different audio command meeting or exceeding a threshold;
  
  associating, by at least one computing device of the one or more computing devices, the audio command with the user-selectable element; and
  
  causing information associated with audio command to be visually output by a projector associated with the second device.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
- - 10. The method of claim 9, further comprising:
    - identifying the audio command in response to a user utterance; and
      
      responding to the user utterance in accordance with the user-selectable element that corresponds to the audio command.
  - 11. The method of claim 9, wherein determining the audio command is performed based at least in part on the acoustic differentiation between the audio command and the different audio command by a machine-implemented speech recognizer.
  - 12. The method of claim 9, wherein determining the audio command comprises selecting text from the content.
  - 13. The method of claim 9, wherein determining the audio command comprises selecting text from the user-selectable element.
  - 14. The method of claim 9, wherein the user-selectable element is associated with audio, and wherein determining the audio command comprises performing speech recognition to identify one or more words that correspond to the audio.
  - 15. The method of claim 9, wherein the user-selectable element is associated with an image, and wherein determining the audio command comprises performing image recognition to identify one or more words that correspond to the image.
  - 16. The method of claim 9, wherein the user-selectable element is associated with non-textual media, and wherein determining the audio command comprises recognizing one or more words indicated by the non-textual media.
  - 17. The method of claim 9, further comprising causing display of textual indications of the audio commands in conjunction with the content.

18. A system comprising:
- one or more processors;
  
  one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;
  
  receiving a request that specifies a user utterance with regard to content having user-selectable elements;
  
  analyzing the content to identify an audio command corresponding to one or more user-selectable elements of the user-selectable elements, the audio command being identified based at least in part on an acoustic differentiation between the audio command and a different audio command meeting or exceeding a threshold;
  
  selecting the audio command based at least in part on the user utterance; and
  
  responding to the request in accordance with a user-selectable element of the one or more user-selectable elements corresponding to the audio command, the responding includingcausing information associated with the audio command to be visually output via a projector associated with a device.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
- - 19. The system of claim 18, wherein the request identifies at least a portion of the content.
  - 20. The system of claim 18, wherein analyzing the content to identify the audio command comprises selecting words of the content based at least in part on differentiability of the words by a machine-implemented speech recognizer.
  - 21. The system of claim 18, wherein analyzing the content to identify the audio command comprises selecting text from the content.
  - 22. The system of claim 18, wherein analyzing the content to identify the audio command comprises selecting text associated with the one or more user-selectable elements of the user-selectable elements.
  - 23. The system of claim 18, wherein at least one user-selectable element of the user-selectable elements is associated with audio, and wherein analyzing the content to identify the audio command comprises performing speech recognition to identify one or more words that correspond to the audio.
  - 24. The system of claim 18, wherein at least one user-selectable element of the user-selectable elements is associated with an image, and wherein analyzing the content to identify the audio command comprises performing image recognition to identify one or more words that correspond to the image.
  - 25. The system of claim 18, wherein at least user-selectable element of the one or more user-selectable elements is associated with non-textual media, and wherein analyzing the content to identify the audio command comprises recognizing one or more words indicated by the non-textual media.
  - 26. The system of claim 18, wherein the acts further comprise causing display of textual indications of the audio command in conjunction with the content.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Soyannwo, Olusanya T., Sadek, Ramy S., Crump, Edward Dietz, Yan, Renwei, Raghunathan, Siddarth
Primary Examiner(s)
Opsasnick, Michael N

Application Number

US13/532,626
Time in Patent Office

1,352 Days
Field of Search

715/230, 715/233, 715/716, 715/728, 704/251
US Class Current

1/1
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 2015/223   Execution procedure of a sp...

G10L 2015/228   of application context

Navigating content utilizing speech-based user-selectable elements

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Navigating content utilizing speech-based user-selectable elements

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links