Determining dialog states for language models
First Claim
1. A computer-implemented method, comprising:
- receiving, by a computing device, audio data for a voice input to the computing device, wherein the voice input corresponds to an unknown stage of a multi-stage voice dialog between the computing device and a user of the computing device;
determining an initial prediction for the unknown stage of the multi-stage voice dialog;
providing, by the computing device and to a voice dialog system, (i) the audio data for the voice input to the computing device and (ii) an indication of the initial prediction for the unknown stage of the multi-stage voice dialog;
receiving, by the computing device and from the voice dialog system, a transcription of the voice input,wherein the transcription was generated by processing the audio data with a model that was biased according to parameters that correspond to a refined prediction for the unknown stage of the multi-stage voice dialog,wherein the voice dialog system is configured to determine the refined prediction for the unknown stage of the multi-stage voice dialog based on (i) the initial prediction for the unknown stage of the multi-stage voice dialog and (ii) additional information that describes a context of the voice input, andwherein the additional information that describes the context of the voice input is independent of content of the voice input; and
presenting the transcription of the voice input with the computing device.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.
189 Citations
18 Claims
-
1. A computer-implemented method, comprising:
-
receiving, by a computing device, audio data for a voice input to the computing device, wherein the voice input corresponds to an unknown stage of a multi-stage voice dialog between the computing device and a user of the computing device; determining an initial prediction for the unknown stage of the multi-stage voice dialog; providing, by the computing device and to a voice dialog system, (i) the audio data for the voice input to the computing device and (ii) an indication of the initial prediction for the unknown stage of the multi-stage voice dialog; receiving, by the computing device and from the voice dialog system, a transcription of the voice input, wherein the transcription was generated by processing the audio data with a model that was biased according to parameters that correspond to a refined prediction for the unknown stage of the multi-stage voice dialog, wherein the voice dialog system is configured to determine the refined prediction for the unknown stage of the multi-stage voice dialog based on (i) the initial prediction for the unknown stage of the multi-stage voice dialog and (ii) additional information that describes a context of the voice input, and wherein the additional information that describes the context of the voice input is independent of content of the voice input; and presenting the transcription of the voice input with the computing device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. One or more non-transitory computer-readable media having instructions stored thereon that, when executed by one or more processors of a computing device, cause the one or more processors to perform operations comprising:
-
receiving, by the computing device, audio data for a voice input to the computing device, wherein the voice input corresponds to an unknown stage of a multi-stage voice dialog between the computing device and a user of the computing device; determining an initial prediction for the unknown stage of the multi-stage voice dialog; providing, by the computing device and to a voice dialog system, (i) the audio data for the voice input to the computing device and (ii) an indication of the initial prediction for the unknown stage of the multi-stage voice dialog; receiving, by the computing device and from the voice dialog system, a transcription of the voice input, wherein the transcription was generated by processing the audio data with a model that was biased according to parameters that correspond to a refined prediction for the unknown stage of the multi-stage voice dialog, wherein the voice dialog system is configured to determine the refined prediction for the unknown stage of the multi-stage voice dialog based on (i) the initial prediction for the unknown stage of the multi-stage voice dialog and (ii) additional information that describes a context of the voice input, and wherein the additional information that describes the context of the voice input is independent of content of the voice input; and presenting the transcription of the voice input with the computing device. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A computing device comprising:
-
one or more processors; and one or more computer-readable media having instructions stored thereon that, when executed by the one or more processors, cause performance of operations comprising; receiving, by the computing device, audio data for a voice input to the computing device, wherein the voice input corresponds to an unknown stage of a multi-stage voice dialog between the computing device and a user of the computing device; determining an initial prediction for the unknown stage of the multi-stage voice dialog; providing, by the computing device and to a voice dialog system, (i) the audio data for the voice input to the computing device and (ii) an indication of the initial prediction for the unknown stage of the multi-stage voice dialog; receiving, by the computing device and from the voice dialog system, a transcription of the voice input, wherein the transcription was generated by processing the audio data with a model that was biased according to parameters that correspond to a refined prediction for the unknown stage of the multi-stage voice dialog, wherein the voice dialog system is configured to determine the refined prediction for the unknown stage of the multi-stage voice dialog based on (i) the initial prediction for the unknown stage of the multi-stage voice dialog and (ii) additional information that describes a context of the voice input, and wherein the additional information that describes the context of the voice input is independent of content of the voice input; and presenting the transcription of the voice input with the computing device. - View Dependent Claims (17, 18)
-
Specification