Identification of utterance subjects

US 9,240,187 B2
Filed: 03/09/2015
Issued: 01/19/2016
Est. Priority Date: 12/20/2012
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a computer-readable memory storing executable instructions; and

one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least;

receive, from a client device, audio data corresponding to an utterance;

receive, from the client device, marker data corresponding to a first portion of a plurality of portions of an audio presentation, the audio presentation presented by the client device contemporaneously with capture of the utterance by the client device;

generate a transcription of the utterance by performing automatic speech recognition on at least a portion of the audio data;

determine an action to be taken responsive to the utterance based at least partly on the transcription and the marker data; and

perform the action.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Features are disclosed for generating markers for elements or other portions of an audio presentation so that a speech processing system may determine which portion of the audio presentation a user utterance refers to. For example, an utterance may include a pronoun with no explicit antecedent. The marker may be used to associate the utterance with the corresponding content portion for processing. The markers can be provided to a client device with a text-to-speech (“TTS”) presentation. The markers may then be provided to a speech processing system along with a user utterance captured by the client device. The speech processing system, which may include automatic speech recognition (“ASR”) modules and/or natural language understanding (“NLU”) modules, can generate hints based on the marker. The hints can be provided to the ASR and/or NLU modules in order to aid in processing the meaning or intent of a user utterance.

Citations

20 Claims

1. A system comprising:
- a computer-readable memory storing executable instructions; and
  
  one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least;
  
  receive, from a client device, audio data corresponding to an utterance;
  
  receive, from the client device, marker data corresponding to a first portion of a plurality of portions of an audio presentation, the audio presentation presented by the client device contemporaneously with capture of the utterance by the client device;
  
  generate a transcription of the utterance by performing automatic speech recognition on at least a portion of the audio data;
  
  determine an action to be taken responsive to the utterance based at least partly on the transcription and the marker data; and
  
  perform the action.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The system of claim 1, wherein presentation by the client device of the audio presentation contemporaneously with capture of the utterance comprises capture of at least a portion of the utterance within a threshold period of time of presentation of the first portion.
  - 3. The system of claim 1, wherein presentation by the client device of the audio presentation contemporaneously with capture of the utterance comprises presentation by the client device of at least a portion of the audio presentation simultaneously with capture of at least a portion of the utterance.
  - 4. The system of claim 1, wherein presentation by the client device of the audio presentation contemporaneously with capture of the utterance comprises capture of at least a portion of the utterance after presentation of the first portion and prior to presentation of a second portion of the plurality of portions of the audio presentation.
  - 5. The system of claim 1, wherein the marker data further comprises a first presentation identifier corresponding to the audio presentation and a second presentation identifier corresponding to a second audio presentation also being presented on the client device contemporaneously with capture of the utterance by the client device.
  - 6. The system of claim 5, wherein the one or more processors are further programmed by the executable instructions to determine, based at least partly on the marker data, that the utterance relates to the audio presentation and not the second audio presentation.
  - 7. The system of claim 1, wherein the one or more processors are further programmed by the executable instructions to:
    - receive second marker data corresponding to a second portion of the plurality of portions of the audio presentation, wherein the second portion is presented after the first portion and prior to capture of the utterance;
      
      determine that an amount of time between a time that presentation of the second portion was initiated and a time that the user utterance was initiated exceeds a threshold; and
      
      determine that the utterance relates to the second portion, wherein the action to be taken responsive to the utterance relates to the second portion.
  - 8. The system of claim 1, wherein the one or more processors are further programmed by the executable instructions to:
    - receive second marker data corresponding to a second portion of the plurality of portions of the audio presentation, wherein the second portion is presented after the first portion and prior to capture of the utterance;
      
      determine that an amount of time between a time that presentation of the second portion was initiated and a time that the user utterance was initiated fails to exceed a threshold; and
      
      determine that the utterance relates to the first portion, wherein the action to be taken responsive to the utterance relates to the first portion.

9. A computer-implemented method comprising:
- under control of one or more computing devices configured with specific computer-executable instructions,receiving, from a client device, audio data corresponding to an utterance;
  
  receiving, from the client device, marker data corresponding to a first portion of a plurality of portions of an audio presentation, the audio presentation presented by the client device contemporaneously with capture of the utterance by the client device;
  
  generating a transcription of the utterance by performing automatic speech recognition on at least a portion of the audio data;
  
  determining an action to be taken responsive to the utterance based at least partly on the transcription and the marker data; and
  
  performing the action.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
- - 10. The computer-implemented method of claim 9, wherein at least a portion of the audio presentation is presented simultaneously with capture of at least a portion of the utterance by the client device.
  - 11. The computer-implemented method of claim 9, wherein the marker data indicates that the first portion of the audio presentation was being presented on the client device when the utterance was initiated.
  - 12. The computer-implemented method of claim 9, wherein the marker data further comprises a first presentation identifier corresponding to the audio presentation and a second presentation identifier corresponding to a second audio presentation also being presented on the client device contemporaneously with capture of the utterance by the client device.
  - 13. The computer-implemented method of claim 12, further comprising determining, based at least partly on the marker data, that the utterance relates to the audio presentation and not the second audio presentation.
  - 14. The computer-implemented method of claim 9, wherein the audio presentation comprises presentation of a list of items, presentation of an audiobook, presentation of news information, or presentation of a text-to-speech audio.
  - 15. The computer-implemented method of claim 9, wherein the action comprises selecting an item from a list of items, modifying an item from a list of items, deleting an item from a list of items, or obtaining additional information regarding a portion of the plurality of portions of the audio presentation.
  - 16. The computer-implemented method of claim 9, further comprising:
    - receiving second marker data corresponding to a second portion of the plurality of portions of the audio presentation, wherein the second portion is presented after the first portion and prior to capture of the utterance;
      
      determining that an amount of time between a time that presentation of the second portion was initiated and a time that the user utterance was initiated exceeds a threshold; and
      
      determining that the utterance relates to the second portion, wherein the action to be taken responsive to the utterance relates to the second portion.
  - 17. The computer-implemented method of claim 9, further comprising:
    - receiving second marker data corresponding to a second portion of the plurality of portions of the audio presentation, wherein the second portion is presented after the first portion and prior to capture of the utterance;
      
      determining that an amount of time between a time that presentation of the second portion was initiated and a time that the user utterance was initiated fails to exceed a threshold; and
      
      determining that the utterance relates to the first portion, wherein the action to be taken responsive to the utterance relates to the first portion.

18. Non-transitory computer-readable storage storing executable code that, when executed by one or more processors, causes the one or more processors to perform a process comprising:
- obtaining audio data corresponding to an utterance;
  
  obtaining marker data corresponding to a first portion of a plurality of portions of an audio presentation, the audio presentation presented contemporaneously with capture of the utterance;
  
  generating a transcription of the utterance by performing automatic speech recognition on at least a portion of the audio data;
  
  determining an action to be taken responsive to the utterance based at least partly on the transcription and the marker data; and
  
  performing the action.
- View Dependent Claims (19, 20)
- - 19. The non-transitory computer-readable storage of claim 18, wherein the executable code causes the one or more processors to obtain the audio data and the marker data via a network connection with a client computing device separate from the one or more processors.
  - 20. The non-transitory computer-readable storage of claim 18, wherein the executable code causes the one or more processors to obtain the audio data from a microphone in communication with the one or more processors.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Torok, Fred, Deramat, Frdric Johan Georges, Gundeti, Vikram Kumar
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US14/642,365
Publication Number

US 20150179175A1
Time in Patent Office

316 Days
Field of Search

704/233, 704/235, 704/246, 704/255, 704/257, 704/260, 704/261, 704/270, 704/270.1, 704/275
US Class Current

1/1
CPC Class Codes

G06F 16/3344   using natural language anal...

G06F 16/60   of audio data

G06F 16/61   Indexing; Data structures t...

G06F 16/685   using automatically derived...

G10L 15/08   Speech classification or se...

G10L 15/222   Barge in, i.e. overridable ...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

Identification of utterance subjects

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Identification of utterance subjects

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links