Two-pass decoding for speech recognition of search and action requests
First Claim
1. A method, comprising:
- receiving input speech at a computing system;
during a first pass of speech recognition, determining a plurality of outputs from a plurality of language models by;
providing the input speech as an input to each of a plurality of language models, wherein the plurality of language models comprises a query language model and an action language model, andreceiving an output from each language model;
determining a selected language model of the plurality of language models using a classifier operating on the plurality of outputs from the plurality of language models, wherein the classifier is configured to utilize a support vector machine (SVM) to select the selected language model, and wherein the SVM is configured to determine a plane or hyperplane related to the plurality of language-model outputs and to select the selected language model based on the plane or hyperplane;
during a second pass of speech recognition, determining a revised output by;
providing the input speech and the output from the selected language model as inputs to the selected language model, andreceiving the revised output from the selected language model; and
generating a result based on the revised output using the computing system.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are apparatus and methods for processing spoken speech. Input speech can be received at a computing system. During a first pass of speech recognition, a plurality of language model outputs can be determined by: providing the input speech to each of a plurality of language models and responsively receiving a language model output from each language model. A language model of the plurality of language models can be selected using a classifier operating on the plurality of language model outputs. During a second pass of speech recognition, a revised language model output can be determined by: providing the input speech and the language model output from the selected language model to the selected language model and responsively receiving the revised language model output from the selected language model. The computing system can generate a result based on the revised language model output.
224 Citations
20 Claims
-
1. A method, comprising:
-
receiving input speech at a computing system; during a first pass of speech recognition, determining a plurality of outputs from a plurality of language models by; providing the input speech as an input to each of a plurality of language models, wherein the plurality of language models comprises a query language model and an action language model, and receiving an output from each language model; determining a selected language model of the plurality of language models using a classifier operating on the plurality of outputs from the plurality of language models, wherein the classifier is configured to utilize a support vector machine (SVM) to select the selected language model, and wherein the SVM is configured to determine a plane or hyperplane related to the plurality of language-model outputs and to select the selected language model based on the plane or hyperplane; during a second pass of speech recognition, determining a revised output by; providing the input speech and the output from the selected language model as inputs to the selected language model, and receiving the revised output from the selected language model; and generating a result based on the revised output using the computing system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computing system, comprising:
-
a processor; and a non-transitory computer-readable storage medium having stored thereon program instructions that, upon execution by the processor, cause the computing device to perform operations comprising; receiving input speech, during a first pass of speech recognition, determining a plurality of outputs from a plurality of language models by; providing the input speech as an input to each of a plurality of language models, wherein the plurality of language models comprises a query language model and an action language model, and receiving an output from each language model, determining a selected language model of the plurality of language models based on a classifier operating on the plurality of outputs from the plurality of language models, wherein the classifier is configured to utilize a support vector machine (SVM) to select the selected language model, and wherein the SVM is configured to determine a plane or hyperplane related to the plurality of language-model outputs and to select the selected language model based on the plane or hyperplane, during a second pass of speech recognition, determining a revised output by; providing the input speech and the output from the selected language model as inputs to the selected language model, receiving a revised output from the selected language model, and generating a result based on the revised output. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable storage medium having stored thereon program instructions that, upon execution by a computing system, cause the computing system to perform operations comprising:
-
receiving input speech; during a first pass of speech recognition, determining a plurality of outputs from a plurality of language models by; providing the input speech as an input to each of a plurality of language models, wherein the plurality of language models comprises a query language model and an action language model, and receiving an output from each language model; determining a selected language model of the plurality of language models based on a classifier operating on the plurality of outputs from the plurality of language models, wherein the classifier is configured to utilize a support vector machine (SVM) to select the selected language model, and wherein the SVM is configured to determine a plane or hyperplane related to the plurality of language-model outputs and to select the selected language model based on the plane or hyperplane; during a second pass of speech recognition, determining a revised output by; providing the input speech and the output from the selected language model as inputs to the selected language model, and receiving a revised output from the selected language model; and generating a result based on the revised output. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification