Identification of utterance subjects
First Claim
1. A system comprising:
- a computer-readable memory storing executable instructions; and
one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least;
receive, from a client device, audio data corresponding to an utterance;
receive, from the client device, marker data corresponding to a first portion of a plurality of portions of an audio presentation, the audio presentation presented by the client device contemporaneously with capture of the utterance by the client device;
generate a transcription of the utterance by performing automatic speech recognition on at least a portion of the audio data;
determine an action to be taken responsive to the utterance based at least partly on the transcription and the marker data; and
perform the action.
0 Assignments
0 Petitions
Accused Products
Abstract
Features are disclosed for generating markers for elements or other portions of an audio presentation so that a speech processing system may determine which portion of the audio presentation a user utterance refers to. For example, an utterance may include a pronoun with no explicit antecedent. The marker may be used to associate the utterance with the corresponding content portion for processing. The markers can be provided to a client device with a text-to-speech (“TTS”) presentation. The markers may then be provided to a speech processing system along with a user utterance captured by the client device. The speech processing system, which may include automatic speech recognition (“ASR”) modules and/or natural language understanding (“NLU”) modules, can generate hints based on the marker. The hints can be provided to the ASR and/or NLU modules in order to aid in processing the meaning or intent of a user utterance.
-
Citations
20 Claims
-
1. A system comprising:
-
a computer-readable memory storing executable instructions; and one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least; receive, from a client device, audio data corresponding to an utterance; receive, from the client device, marker data corresponding to a first portion of a plurality of portions of an audio presentation, the audio presentation presented by the client device contemporaneously with capture of the utterance by the client device; generate a transcription of the utterance by performing automatic speech recognition on at least a portion of the audio data; determine an action to be taken responsive to the utterance based at least partly on the transcription and the marker data; and perform the action. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented method comprising:
under control of one or more computing devices configured with specific computer-executable instructions, receiving, from a client device, audio data corresponding to an utterance; receiving, from the client device, marker data corresponding to a first portion of a plurality of portions of an audio presentation, the audio presentation presented by the client device contemporaneously with capture of the utterance by the client device; generating a transcription of the utterance by performing automatic speech recognition on at least a portion of the audio data; determining an action to be taken responsive to the utterance based at least partly on the transcription and the marker data; and performing the action. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
18. Non-transitory computer-readable storage storing executable code that, when executed by one or more processors, causes the one or more processors to perform a process comprising:
-
obtaining audio data corresponding to an utterance; obtaining marker data corresponding to a first portion of a plurality of portions of an audio presentation, the audio presentation presented contemporaneously with capture of the utterance; generating a transcription of the utterance by performing automatic speech recognition on at least a portion of the audio data; determining an action to be taken responsive to the utterance based at least partly on the transcription and the marker data; and performing the action. - View Dependent Claims (19, 20)
-
Specification