Targeted clarification questions in speech recognition with concept presence score and concept correctness score
First Claim
1. A method comprising:
- processing, via a speech recognizer, an utterance from a speaker to produce speech recognition output;
identifying speech segments in the speech recognition output;
generating two pairs of values for each speech segment including a first pair indicating a concept presence score for a corresponding speech segment and a second pair indicating a concept correctness score for the corresponding speech segment using a context that is unavailable to the speech recognizer throughout a dialog;
generating, for a chosen speech segment from the speech segments and based on the concept presence score and the concept correctness score, a targeted clarification question associated with the utterance, wherein the chosen speech segment is a recognizable speech segment that has a high recognition certainty in which the context indicates that a word in the chosen speech segment is unsuitable for the context; and
presenting the targeted clarification question to the speaker in response to the utterance.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method and computer-readable storage devices are disclosed for using targeted clarification (TC) questions in dialog systems in a multimodal virtual agent system (MVA) providing access to information about movies, restaurants, and musical events. In contrast with open-domain spoken systems, the MVA application covers a domain with a fixed set of concepts and uses a natural language understanding (NLU) component to mark concepts in automatically recognized speech. Instead of identifying an error segment, localized error detection (LED) identifies which of the concepts are likely to be present and correct using domain knowledge, automatic speech recognition (ASR), and NLU tags and scores. If at least concept is identified to be present but not correct, the TC component uses this information to generate a targeted clarification question. This approach computes probability distributions of concept presence and correctness for each user utterance, which can apply to automatic learning for clarification policies.
-
Citations
20 Claims
-
1. A method comprising:
-
processing, via a speech recognizer, an utterance from a speaker to produce speech recognition output; identifying speech segments in the speech recognition output; generating two pairs of values for each speech segment including a first pair indicating a concept presence score for a corresponding speech segment and a second pair indicating a concept correctness score for the corresponding speech segment using a context that is unavailable to the speech recognizer throughout a dialog; generating, for a chosen speech segment from the speech segments and based on the concept presence score and the concept correctness score, a targeted clarification question associated with the utterance, wherein the chosen speech segment is a recognizable speech segment that has a high recognition certainty in which the context indicates that a word in the chosen speech segment is unsuitable for the context; and presenting the targeted clarification question to the speaker in response to the utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a processor; a speech recognizer; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; generating two pairs of values for each speech segment including a first pair indicating a concept presence score for a corresponding speech segment and a second pair indicating a concept correctness score for the corresponding speech segment using a context that is unavailable to the speech recognizer throughout a dialog; generating, for a chosen speech segment from the speech segments and based on the concept presence score and the concept correctness score, a targeted clarification question associated with an utterance, wherein the chosen speech segment is a recognizable speech segment that has a high recognition certainty in which the context indicates that a word in the chosen speech segment is unsuitable for the context; and presenting the targeted clarification question to a speaker in response to the utterance. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
generating two pairs of values for each speech segment including a first pair indicating a concept presence score for a corresponding speech segment and a second pair indicating a concept correctness score for the corresponding speech segment using a context that is unavailable to the speech recognizer throughout a dialog; generating, for a chosen speech segment from the speech segments and based on the concept presence score and the concept correctness score, a targeted clarification question associated with an utterance, wherein the chosen speech segment is a recognizable speech segment that has a high recognition certainty in which the context indicates that a word in the chosen speech segment is unsuitable for the context; and presenting the targeted clarification question to a speaker in response to the utterance. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification