Determining dialog states for language models
First Claim
1. A computer-implemented method, comprising:
- receiving, at a computing system, audio data that indicates a first voice input that was provided to a computing device;
determining that the first voice input is part of a voice dialog that includes a plurality of pre-defined dialog states arranged to receive a series of voice inputs related to a particular task, wherein each dialog state is mapped to;
(i) a set of display data characterizing content that is designated for display when voice inputs for the dialog state are received, and(ii) a set of n-grams;
receiving, at the computing system, first display data that characterizes content that was displayed on a screen of the computing device when the first voice input was provided to the computing device;
selecting, by the computing system, a particular dialog state of the plurality of pre-defined dialog states that corresponds to the first voice input, including determining a match between the first display data and the corresponding set of display data that is mapped to the particular dialog state;
biasing a language model by adjusting probability scores that the language model indicates for n-grams in the corresponding set of n-grams that are mapped to the particular dialog state; and
transcribing the voice input using the biased language model.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.
-
Citations
12 Claims
-
1. A computer-implemented method, comprising:
-
receiving, at a computing system, audio data that indicates a first voice input that was provided to a computing device; determining that the first voice input is part of a voice dialog that includes a plurality of pre-defined dialog states arranged to receive a series of voice inputs related to a particular task, wherein each dialog state is mapped to; (i) a set of display data characterizing content that is designated for display when voice inputs for the dialog state are received, and (ii) a set of n-grams; receiving, at the computing system, first display data that characterizes content that was displayed on a screen of the computing device when the first voice input was provided to the computing device; selecting, by the computing system, a particular dialog state of the plurality of pre-defined dialog states that corresponds to the first voice input, including determining a match between the first display data and the corresponding set of display data that is mapped to the particular dialog state; biasing a language model by adjusting probability scores that the language model indicates for n-grams in the corresponding set of n-grams that are mapped to the particular dialog state; and transcribing the voice input using the biased language model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computing system comprising:
-
one or more processors; and one or more computer-readable media having instructions stored thereon that, when executed, cause performance of operations comprising; receiving audio data that indicates a first voice input that was provided to a computing device; determining that the first voice input is part of a voice dialog that includes a plurality of pre-defined dialog states arranged to receive a series of voice inputs related to a particular task, wherein each dialog state is mapped to; (i) a set of display data characterizing content that is designated for display when voice inputs for the dialog state are received, and (ii) a set of n-grams; receiving first display data that characterizes content that was displayed on a screen of the computing device when the first voice input was provided to the computing device; selecting a particular dialog state of the plurality of pre-defined dialog states;
that corresponds to the first voice input, including determining a match between the first display data and the corresponding set of display data that is mapped to the particular dialog state;biasing a language model by adjusting probability scores that the language model indicates for n-grams in the corresponding set of n-grams that are mapped to the particular dialog state; and transcribing the voice input using the biased language model. - View Dependent Claims (11)
-
-
12. One or more non-transitory computer-readable media having instructions stored thereon that, when executed by one or more processors, cause the processors to perform operations comprising:
-
receiving audio data that indicates a first voice input that was provided to a computing device; determining that the first voice input is part of a voice dialog that includes a plurality of pre-defined dialog states arranged to receive a series of voice inputs related to a particular task, wherein each dialog state is mapped to; (i) a set of display data characterizing content that is designated for display when voice inputs for the dialog state are received, and (ii) a set of n-grams; receiving first display data that characterizes content that was displayed on a screen of the computing device when the first voice input was provided to the computing device; selecting a particular dialog state of the plurality of pre-defined dialog states that corresponds to the first voice input, including determining a match between the first display data and the corresponding set of display data that is mapped to the particular dialog state; biasing a language model by adjusting probability scores that the language model indicates for n-grams in the corresponding set of n-grams that are mapped to the particular dialog state; and transcribing the voice input using the adjusted biased language model.
-
Specification