Disambiguation in speech recognition
First Claim
1. A method for performing automatic speech recognition (ASR) processing on an utterance using trained classifiers, the method comprising:
- determining a first classifier, wherein the first classifier is trained to determine, using confidence scores corresponding to a plurality of ASR hypotheses, whether to select a single ASR hypothesis or whether to perform further selection from among the plurality of ASR hypotheses;
determining a second classifier, wherein the second classifier is trained to determine, using the confidence scores, which of the plurality of ASR hypothesis to select for disambiguation;
receiving, from a mobile device, audio data corresponding to an utterance;
performing ASR processing on the audio data to generate a first ASR hypothesis corresponding to a first confidence score, a second ASR hypothesis corresponding to a second confidence score and a third ASR hypothesis corresponding to a third confidence score;
processing, using a processor, the first confidence score, the second confidence score and the third confidence score using the first classifier to output an indication indicating the first ASR hypothesis, the second ASR hypothesis and the third ASR hypothesis for further processing;
processing, using a processor, the first confidence score, the second confidence score and the third confidence score using the second classifier to output an indication that the first ASR hypothesis and second ASR hypothesis are selected for disambiguation;
sending the first ASR hypothesis and the second ASR hypothesis to the mobile device for disambiguation; and
receiving a selection from the mobile device, the selection indicating the first ASR hypothesis.
1 Assignment
0 Petitions
Accused Products
Abstract
Automatic speech recognition (ASR) processing including a two-stage configuration. After ASR processing of an incoming utterance where the ASR outputs an N-best list including multiple hypotheses, a first stage determines whether to execute a command associated with one of the hypotheses or whether to output some of the hypotheses of the N-best list for disambiguation. A second stage determines what hypotheses should be included in the disambiguation choices. A first machine learning model is used at the first stage and a second machine learning model is used at the second stage. The multi-stage configuration allows for reduced speech processing errors as well as a reduced number of utterances sent for disambiguation, which thus improves the user experience.
191 Citations
20 Claims
-
1. A method for performing automatic speech recognition (ASR) processing on an utterance using trained classifiers, the method comprising:
-
determining a first classifier, wherein the first classifier is trained to determine, using confidence scores corresponding to a plurality of ASR hypotheses, whether to select a single ASR hypothesis or whether to perform further selection from among the plurality of ASR hypotheses; determining a second classifier, wherein the second classifier is trained to determine, using the confidence scores, which of the plurality of ASR hypothesis to select for disambiguation; receiving, from a mobile device, audio data corresponding to an utterance; performing ASR processing on the audio data to generate a first ASR hypothesis corresponding to a first confidence score, a second ASR hypothesis corresponding to a second confidence score and a third ASR hypothesis corresponding to a third confidence score; processing, using a processor, the first confidence score, the second confidence score and the third confidence score using the first classifier to output an indication indicating the first ASR hypothesis, the second ASR hypothesis and the third ASR hypothesis for further processing; processing, using a processor, the first confidence score, the second confidence score and the third confidence score using the second classifier to output an indication that the first ASR hypothesis and second ASR hypothesis are selected for disambiguation; sending the first ASR hypothesis and the second ASR hypothesis to the mobile device for disambiguation; and receiving a selection from the mobile device, the selection indicating the first ASR hypothesis. - View Dependent Claims (2, 3, 4)
-
-
5. A method comprising:
-
receiving audio data corresponding to an utterance; performing speech processing on the audio data to generate; a first hypothesis corresponding to a first feature, a second hypothesis corresponding to a second feature, and a third hypothesis corresponding to a third feature; processing, using a processor, the first feature, the second feature and the third feature using a first model to identify the first hypothesis, the second hypothesis, and the third hypothesis for further selection; and processing, using a processor, the first feature, the second feature and the third feature using a second model to select the first hypothesis and second hypothesis for disambiguation. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
-
13. A computing system comprising:
-
at least one processor; a memory including instructions operable to be executed by the at least one processor to cause the system to perform a set of actions comprising; receiving audio data corresponding to an utterance; performing speech processing on the audio data to generate; a first hypothesis corresponding to a first feature, a second hypothesis corresponding to a second feature, and a third hypothesis corresponding to a third feature; processing the first feature, the second feature and the third feature using a first model to identify the first hypothesis, the second hypothesis, and the third hypothesis for further selection; and processing the first feature, the second feature and the third feature using a second model to select the first hypothesis and second hypothesis for disambiguation. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification