Determining Dialog States for Language Models
First Claim
1. A computer-implemented method, comprising:
- receiving, at a computing system, audio data that indicates a voice input that was provided to a computing device;
receiving, at the computing system, display data that characterizes an interface that was displayed on a screen of the computing device when the voice input was provided to the computing device;
determining, by the computing system and based at least on the display data that characterizes the interface that was displayed on the screen of the computing device when the voice input was provided to the computing device, a particular dialog state, from among a plurality of dialog states, that corresponds to the voice input;
identifying a set of n-grams that are associated with the particular dialog state that corresponds to the voice input, wherein the set of n-grams are associated with the particular dialog state based at least on n-grams in the set of n-grams occurring frequently in historical voice inputs that correspond to the dialog state;
in response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, biasing a language model by adjusting probability scores that the language model indicates for n-grams in the set of n-grams; and
transcribing the voice input using the adjusted language model.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.
-
Citations
23 Claims
-
1. A computer-implemented method, comprising:
-
receiving, at a computing system, audio data that indicates a voice input that was provided to a computing device; receiving, at the computing system, display data that characterizes an interface that was displayed on a screen of the computing device when the voice input was provided to the computing device; determining, by the computing system and based at least on the display data that characterizes the interface that was displayed on the screen of the computing device when the voice input was provided to the computing device, a particular dialog state, from among a plurality of dialog states, that corresponds to the voice input; identifying a set of n-grams that are associated with the particular dialog state that corresponds to the voice input, wherein the set of n-grams are associated with the particular dialog state based at least on n-grams in the set of n-grams occurring frequently in historical voice inputs that correspond to the dialog state; in response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, biasing a language model by adjusting probability scores that the language model indicates for n-grams in the set of n-grams; and transcribing the voice input using the adjusted language model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 9, 21, 22)
-
-
8. (canceled)
-
10-17. -17. (canceled)
-
18. A computing system comprising:
-
one or more processors; and one or more computer-readable media having instructions stored thereon that, when executed, cause performance of operations comprising; receiving audio data that indicates a voice input that was provided to a computing device; receiving display data that characterizes an interface that was displayed on a screen of the computing device when the voice input was provided to the computing device; determining, based at least on the display data that characterizes the interface that was displayed on the screen of the computing device when the voice input was provided to the computing device, a particular dialog state, from among a plurality of dialog states, that corresponds to the voice input; identifying a set of n-grams that are associated with the particular dialog state that corresponds to the voice input, wherein the set of n-grams are associated with the particular dialog state based at least on n-grams in the set of n-grams occurring frequently in historical voice inputs that correspond to the dialog state; in response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, adjusting a language model by increasing probability scores indicated by the language model of n-grams in the set of n-grams; and transcribing the voice input using the adjusted language model. - View Dependent Claims (19, 20)
-
-
23. One or more non-transitory computer-readable media having instructions stored thereon that, when executed by one or more processors, cause the processors to perform operations comprising:
-
receiving audio data that indicates a voice input that was provided to a computing device; receiving display data that characterizes an interface that was displayed on a screen of the computing device when the voice input was provided to the computing device; determining, based at least on the display data that characterizes the interface that was displayed on the screen of the computing device when the voice input was provided to the computing device, a particular dialog state, from among a plurality of dialog states, that corresponds to the voice input; identifying a set of n-grams that are associated with the particular dialog state that corresponds to the voice input, wherein the set of n-grams are associated with the particular dialog state based at least on n-grams in the set of n-grams occurring frequently in historical voice inputs that correspond to the dialog state; in response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, adjusting a language model by increasing probability scores indicated by the language model of n-grams in the set of n-grams; and transcribing the voice input using the adjusted language model.
-
Specification