Identification of utterance subjects
First Claim
1. A system comprising:
- a computer-readable memory storing executable instructions; and
one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least;
generate text to be presented to a user, wherein the text comprises a sequence of items;
generate an audio presentation using the text;
associate a plurality of identifiers with the sequence of items, wherein each item of the sequence of items is associated with at least one identifier of the plurality of identifiers;
transmit, to a client device, the audio presentation and the plurality of identifiers;
receive, from the client device;
audio data comprising a user utterance; and
a first identifier of the plurality of identifiers;
perform speech recognition on the user utterance to obtain speech recognition results;
identify a first item, of the sequence of items, based at least partly on the first identifier and the speech recognition results; and
perform an action based at least partly on the first item.
1 Assignment
0 Petitions
Accused Products
Abstract
Features are disclosed for generating markers for elements or other portions of an audio presentation so that a speech processing system may determine which portion of the audio presentation a user utterance refers to. For example, an utterance may include a pronoun with no explicit antecedent. The marker may be used to associate the utterance with the corresponding content portion for processing. The markers can be provided to a client device with a text-to-speech (“TTS”) presentation. The markers may then be provided to a speech processing system along with a user utterance captured by the client device. The speech processing system, which may include automatic speech recognition (“ASR”) modules and/or natural language understanding (“NLU”) modules, can generate hints based on the marker. The hints can be provided to the ASR and/or NLU modules in order to aid in processing the meaning or intent of a user utterance.
-
Citations
32 Claims
-
1. A system comprising:
-
a computer-readable memory storing executable instructions; and one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least; generate text to be presented to a user, wherein the text comprises a sequence of items; generate an audio presentation using the text; associate a plurality of identifiers with the sequence of items, wherein each item of the sequence of items is associated with at least one identifier of the plurality of identifiers; transmit, to a client device, the audio presentation and the plurality of identifiers; receive, from the client device; audio data comprising a user utterance; and a first identifier of the plurality of identifiers; perform speech recognition on the user utterance to obtain speech recognition results; identify a first item, of the sequence of items, based at least partly on the first identifier and the speech recognition results; and perform an action based at least partly on the first item. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method comprising:
under control of one or more computing devices configured with specific computer-executable instructions, transmitting, to a client device; an audio presentation comprising a first portion and a second portion, wherein the first portion corresponds to a first item and the second portion corresponds to a second item; a first marker corresponding to the first item; and a second marker corresponding to the second item; receiving, from the client device; audio data comprising a user utterance; and marker data comprising the first marker or the second marker; and selecting an item based at least on the marker data or the audio data, wherein the selected item comprises the first item or the second item. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15)
-
16. A non-transitory computer readable medium comprising executable code that, when executed by a processor, causes a computing device to perform a process comprising:
-
transmitting, to a client device; an audio presentation comprising a first portion and a second portion, wherein the first portion corresponds to a first item and the second portion corresponds to a second item; a first marker corresponding to the first item; and a second marker corresponding to the second item; receiving, from the client device; audio data comprising a user utterance; and marker data comprising the first marker or the second marker; and selecting an item based at least on the marker data or the audio data, wherein the selected item comprises the first item or the second item. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A non-transitory computer readable medium comprising executable code that, when executed by a processor, causes a computing device to perform a process comprising:
-
receiving, from a speech processing system; an audio presentation comprising a first portion corresponding to a first item and a second portion corresponding to a second item; a first marker corresponding to the first item; and a second marker corresponding to the second item; presenting the audio presentation; and transmitting, to the speech processing system; audio data received via an audio input component of the computing device; and marker data comprising at least one of the first marker or the second marker. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32)
-
Specification