Identification of utterance subjects

US 8,977,555 B2
Filed: 12/20/2012
Issued: 03/10/2015
Est. Priority Date: 12/20/2012
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a computer-readable memory storing executable instructions; and

one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least;

generate text to be presented to a user, wherein the text comprises a sequence of items;

generate an audio presentation using the text;

associate a plurality of identifiers with the sequence of items, wherein each item of the sequence of items is associated with at least one identifier of the plurality of identifiers;

transmit, to a client device, the audio presentation and the plurality of identifiers;

receive, from the client device;

audio data comprising a user utterance; and

a first identifier of the plurality of identifiers;

perform speech recognition on the user utterance to obtain speech recognition results;

identify a first item, of the sequence of items, based at least partly on the first identifier and the speech recognition results; and

perform an action based at least partly on the first item.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Features are disclosed for generating markers for elements or other portions of an audio presentation so that a speech processing system may determine which portion of the audio presentation a user utterance refers to. For example, an utterance may include a pronoun with no explicit antecedent. The marker may be used to associate the utterance with the corresponding content portion for processing. The markers can be provided to a client device with a text-to-speech (“TTS”) presentation. The markers may then be provided to a speech processing system along with a user utterance captured by the client device. The speech processing system, which may include automatic speech recognition (“ASR”) modules and/or natural language understanding (“NLU”) modules, can generate hints based on the marker. The hints can be provided to the ASR and/or NLU modules in order to aid in processing the meaning or intent of a user utterance.

Citations

32 Claims

1. A system comprising:
- a computer-readable memory storing executable instructions; and
  
  one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least;
  
  generate text to be presented to a user, wherein the text comprises a sequence of items;
  
  generate an audio presentation using the text;
  
  associate a plurality of identifiers with the sequence of items, wherein each item of the sequence of items is associated with at least one identifier of the plurality of identifiers;
  
  transmit, to a client device, the audio presentation and the plurality of identifiers;
  
  receive, from the client device;
  
  audio data comprising a user utterance; and
  
  a first identifier of the plurality of identifiers;
  
  perform speech recognition on the user utterance to obtain speech recognition results;
  
  identify a first item, of the sequence of items, based at least partly on the first identifier and the speech recognition results; and
  
  perform an action based at least partly on the first item.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The system of claim 1, wherein the sequence of items comprises a sequence of reminders, a sequence of items on a task list, or a sequence of items available for purchase.
  - 3. The system of claim 1, wherein the first identifier and the audio data are received in a single data transmission.
  - 4. The system of claim 1, wherein the one or more processors are further programmed by the executable instructions to:
    - generate a hint using the sequence of items; and
      
      identify the first item using the hint.
  - 5. The system of claim 1, wherein the one or more processors are further programmed by the executable instructions to at least:
    - receive, from the client device, information about a second audio presentation being presented on the client device; and
      
      identify the first item using the information.

6. A computer-implemented method comprising:
- under control of one or more computing devices configured with specific computer-executable instructions,transmitting, to a client device;
  
  an audio presentation comprising a first portion and a second portion, wherein the first portion corresponds to a first item and the second portion corresponds to a second item;
  
  a first marker corresponding to the first item; and
  
  a second marker corresponding to the second item;
  
  receiving, from the client device;
  
  audio data comprising a user utterance; and
  
  marker data comprising the first marker or the second marker; and
  
  selecting an item based at least on the marker data or the audio data, wherein the selected item comprises the first item or the second item.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 7. The computer-implemented method of claim 6, wherein the marker data comprises the second marker, the computer-implemented method further comprising:
    - determining an amount of time between a first time that presentation of the second portion was initiated, and a second time that the user utterance was initiated.
  - 8. The computer-implemented method of claim 7, where the amount of time is less than a predetermined threshold, and wherein identifying the related portion comprises identifying the first portion based at least partly on the amount of time.
  - 9. The computer-implemented method of claim 7, where the amount of time exceeds a predetermined threshold, and wherein identifying the related portion comprises identifying the second portion based at least partly on the amount of time.
  - 10. The computer-implemented method of claim 6, wherein the marker data further comprises a first presentation identifier corresponding to the audio presentation and a second presentation identifier corresponding to a second audio presentation being presented on the client device when the user utterance was initiated.
  - 11. The computer-implemented method of claim 10, further comprising:
    - determining that the user utterance relates to the audio presentation based at least partly on the marker data and the user utterance.
  - 12. The computer-implemented method of claim 6, wherein the audio presentation is transmitted in a first data stream, and the first marker and second marker are transmitted in one of the first data stream or a second data stream.
  - 13. The computer-implemented method of claim 6, wherein the first portion and the second portion are transmitted in separate transmissions.
  - 14. The computer-implemented method of claim 6, wherein the marker data indicates which portion of the audio presentation was being presented on the client device when the utterance was initiated.
  - 15. The computer-implemented method of claim 6, further comprising:
    - performing an action based at least partly on the selected item.

16. A non-transitory computer readable medium comprising executable code that, when executed by a processor, causes a computing device to perform a process comprising:
- transmitting, to a client device;
  
  an audio presentation comprising a first portion and a second portion, wherein the first portion corresponds to a first item and the second portion corresponds to a second item;
  
  a first marker corresponding to the first item; and
  
  a second marker corresponding to the second item;
  
  receiving, from the client device;
  
  audio data comprising a user utterance; and
  
  marker data comprising the first marker or the second marker; and
  
  selecting an item based at least on the marker data or the audio data, wherein the selected item comprises the first item or the second item.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
- - 17. The non-transitory computer readable medium of claim 16, wherein the marker data and the audio data are received in a single data stream.
  - 18. The non-transitory computer readable medium of claim 16, wherein the marker data and the audio data are received in separate data transmissions.
  - 19. The non-transitory computer readable medium of claim 16, the process further comprising:
    - generating a hint using the first item and the second item; and
      
      selecting the applicable item using the hint.
  - 20. The non-transitory computer readable medium of claim 16, the process further comprising:
    - generating a hint using the marker data; and
      
      selecting the applicable item using the hint.
  - 21. The non-transitory computer readable medium of claim 16, the process further comprising:
    - receiving, from the client device, information about a second audio presentation being presented on the client device; and
      
      selecting the applicable item using the information.
  - 22. The non-transitory computer readable medium of claim 16, wherein the audio presentation is transmitted in a first data stream, and the first marker and second marker are transmitted in one of the first data stream or a second data stream.
  - 23. The non-transitory computer readable medium of claim 16, wherein the marker data indicates which portion of the audio presentation was being presented on the client device when the utterance was initiated.
  - 24. The non-transitory computer readable medium of claim 16, the process further comprising:
    - performing an action based at least partly on the selected item.

25. A non-transitory computer readable medium comprising executable code that, when executed by a processor, causes a computing device to perform a process comprising:
- receiving, from a speech processing system;
  
  an audio presentation comprising a first portion corresponding to a first item and a second portion corresponding to a second item;
  
  a first marker corresponding to the first item; and
  
  a second marker corresponding to the second item;
  
  presenting the audio presentation; and
  
  transmitting, to the speech processing system;
  
  audio data received via an audio input component of the computing device; and
  
  marker data comprising at least one of the first marker or the second marker.
- View Dependent Claims (26, 27, 28, 29, 30, 31, 32)
- - 26. The non-transitory computer readable medium of claim 25, wherein the audio presentation is received in a first data stream, and the first marker and second marker are received in one of the first data stream or a second data stream.
  - 27. The non-transitory computer readable medium of claim 25, wherein the marker data and the audio data are transmitted in a single data stream.
  - 28. The non-transitory computer readable medium of claim 25, wherein the marker data and the audio data are transmitted in separate data transmissions.
  - 29. The non-transitory computer readable medium of claim 25, the process further comprising:
    - initiating, substantially concurrently with presenting the audio presentation, a data stream to the speech processing system, the data stream comprising the audio data.
  - 30. The non-transitory computer readable medium of claim 25, wherein the marker data comprises the first marker, and wherein the marker data is transmitted substantially concurrently with presentation of the first item.
  - 31. The non-transitory computer readable medium of claim 25, the process further comprising:
    - presenting a second audio presentation substantially concurrently with presentation of the audio presentation; and
      
      transmitting, to the speech processing system, a first presentation identifier corresponding to the audio presentation and a second presentation identifier corresponding to a second audio presentation.
  - 32. The non-transitory computer readable medium of claim 25, wherein the first marker comprises a first identifier and the second marker comprises a second identifier.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Torok, Fred, Deramat, Frdric Johan Georges, Gundeti, Vikram Kumar
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US13/723,026
Publication Number

US 20140180697A1
Time in Patent Office

810 Days
Field of Search

704/260, 704/261, 704/270, 704/270.1, 704/275, 704/233, 704/235, 704/246, 704/255, 704/257
US Class Current

704/275
CPC Class Codes

G06F 16/3344   using natural language anal...

G06F 16/60   of audio data

G06F 16/61   Indexing; Data structures t...

G06F 16/685   using automatically derived...

G10L 15/08   Speech classification or se...

G10L 15/222   Barge in, i.e. overridable ...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

Identification of utterance subjects

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Identification of utterance subjects

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links