Disambiguation in speech recognition
First Claim
1. A method for performing automatic speech recognition (ASR) processing on an utterance including a search request, the method comprising:
- receiving, from a mobile device, audio data corresponding to an utterance, the utterance comprising a search request;
performing ASR processing on the audio data to determine ASR results, the ASR results comprising a first ASR hypothesis, a second ASR hypothesis and a third ASR hypothesis;
determining a disambiguation group, wherein the disambiguation group comprises the first ASR hypothesis, the second ASR hypothesis and the third ASR hypothesis;
processing, by a search engine;
the first ASR hypothesis to determine first search results comprising a first plurality of entities,the second ASR hypothesis to determine second search results comprising a second plurality of entities, andthe third ASR hypothesis to determine third search results comprising a second plurality of entities;
determining that the second ASR hypothesis is similar to the third ASR hypothesis as a result of overlap between the second plurality of entities and the third plurality of entities;
determining a revised disambiguation group, wherein the revised disambiguation group comprises the first ASR hypothesis and the second ASR hypothesis; and
sending, to the mobile device, data corresponding to the revised disambiguation group for disambiguation.
1 Assignment
0 Petitions
Accused Products
Abstract
Automatic speech recognition (ASR) processing including a feedback configuration to allow for improved disambiguation between ASR hypotheses. After ASR processing of an incoming utterance where the ASR outputs an N-best list including multiple hypotheses, the multiple hypotheses are passed downstream for further processing. The downstream further processing may include natural language understanding (NLU) or other processing to determine a command result for each hypothesis. The command results are compared to determine if any hypotheses of the N-best list would yield similar command results. If so, the hypothesis(es) with similar results are removed from the N-best list so that only one hypothesis of the similar results remains in the N-best list. The remaining non-similar hypotheses are sent for disambiguation, or, if only one hypothesis remains, it is sent for execution.
45 Citations
21 Claims
-
1. A method for performing automatic speech recognition (ASR) processing on an utterance including a search request, the method comprising:
-
receiving, from a mobile device, audio data corresponding to an utterance, the utterance comprising a search request; performing ASR processing on the audio data to determine ASR results, the ASR results comprising a first ASR hypothesis, a second ASR hypothesis and a third ASR hypothesis; determining a disambiguation group, wherein the disambiguation group comprises the first ASR hypothesis, the second ASR hypothesis and the third ASR hypothesis; processing, by a search engine; the first ASR hypothesis to determine first search results comprising a first plurality of entities, the second ASR hypothesis to determine second search results comprising a second plurality of entities, and the third ASR hypothesis to determine third search results comprising a second plurality of entities; determining that the second ASR hypothesis is similar to the third ASR hypothesis as a result of overlap between the second plurality of entities and the third plurality of entities; determining a revised disambiguation group, wherein the revised disambiguation group comprises the first ASR hypothesis and the second ASR hypothesis; and sending, to the mobile device, data corresponding to the revised disambiguation group for disambiguation. - View Dependent Claims (2, 3)
-
-
4. A method comprising:
-
receiving, from a device, audio data corresponding to an utterance; performing speech processing on the audio data to determine at least a first hypothesis, a second hypothesis and a third hypothesis; executing a first command associated with the first hypothesis to determine first results; executing a second command associated with the second hypothesis to determine second results; executing a third command associated with the third hypothesis to determine third results; comparing the second results to the third results; selecting the first hypothesis and the second hypothesis based at least in part on the comparing; sending, to the device, at least a portion of the first hypothesis; sending, to the device, at least a portion of the second hypothesis; receiving, from the device, an indication of selection of the first hypothesis; and sending, to the device, data associated with the first results. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computing system comprising:
-
at least one processor; a memory including instructions operable to be executed by the at least one processor to cause the system to perform a set of actions comprising; receiving, from a device, audio data corresponding to an utterance; performing speech processing on the audio data to determine at least a first hypothesis, a second hypothesis and a third hypothesis; executing a first command associated with the first hypothesis to determine first results; executing a second command associated with the second hypothesis to determine second results; executing a third command associated with the third hypothesis to determine third results; comparing the second results to the third results; selecting the first hypothesis and the second hypothesis based at least in part on the comparing; sending, to the device, at least a portion of the first hypothesis; sending, to the device, at least a portion of the second hypothesis; receiving, from the device, an indication of selection of the first hypothesis; and sending, to the device, data associated with the first results. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
-
Specification