System and method for disambiguating multiple intents in a natural language dialog system
First Claim
1. A method comprising:
- receiving, via an interactive voice recognition system, a user utterance and converting the user utterance to text;
generating multiple intents based on the text;
establishing, via the interactive voice recognition system, a confidence score for each intent in the multiple intents, wherein the confidence score for each intent is based on how much training data corresponding to the each intent was used to train a spoken language understanding module, where more training data corresponds to a higher confidence;
when only a single intent in the multiple intents has a confidence score above a threshold;
identifying a plurality of call types associated with the multiple intents; and
applying predefined precedence rules to respond to only a single call type in the plurality of call types, the single call type associated with the single intent; and
when multiple intents have confidence scores above the threshold;
identifying a first intent and a second intent based on the confidence scores for the multiple intents, wherein the first intent and the second intent have a highest two confidence scores in the multiple intents; and
disambiguating the first intent and the second intent by presenting a disambiguation sub-dialog, via the interactive voice recognition system, wherein a user is offered a choice of which intent to process first, wherein the user is first presented with one of the first intent and the second intent having a lowest confidence score between the first intent and the second intent.
5 Assignments
0 Petitions
Accused Products
Abstract
The present invention addresses the deficiencies in the prior art by providing an improved dialog for disambiguating a user utterance containing more than one intent. The invention comprises methods, computer-readable media, and systems for engaging in a dialog. The method embodiment of the invention relates to a method of disambiguating a user utterance containing at least two user intents. The method comprises establishing a confidence threshold for spoken language understanding to encourage that multiple intents are returned, determining whether a received utterance comprises a first intent and a second intent and, if the received utterance contains the first intent and the second intent, disambiguating the first intent and the second intent by presenting a disambiguation sub-dialog wherein the user is offered a choice of which intent to process first, wherein the user is first presented with the intent of the first or second intents having the lowest confidence score.
-
Citations
18 Claims
-
1. A method comprising:
-
receiving, via an interactive voice recognition system, a user utterance and converting the user utterance to text; generating multiple intents based on the text; establishing, via the interactive voice recognition system, a confidence score for each intent in the multiple intents, wherein the confidence score for each intent is based on how much training data corresponding to the each intent was used to train a spoken language understanding module, where more training data corresponds to a higher confidence; when only a single intent in the multiple intents has a confidence score above a threshold; identifying a plurality of call types associated with the multiple intents; and applying predefined precedence rules to respond to only a single call type in the plurality of call types, the single call type associated with the single intent; and when multiple intents have confidence scores above the threshold; identifying a first intent and a second intent based on the confidence scores for the multiple intents, wherein the first intent and the second intent have a highest two confidence scores in the multiple intents; and disambiguating the first intent and the second intent by presenting a disambiguation sub-dialog, via the interactive voice recognition system, wherein a user is offered a choice of which intent to process first, wherein the user is first presented with one of the first intent and the second intent having a lowest confidence score between the first intent and the second intent. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
receiving, via an interactive voice recognition system, a user utterance and converting the user utterance to text; generating multiple intents based on the text; establishing, via the interactive voice recognition system, a confidence score for each intent in the multiple intents, wherein the confidence score for each intent is based on how much training data corresponding to the each intent was used to train a spoken language understanding module, where more training data corresponds to a higher confidence; when only a single intent in the multiple intents has a confidence score above a threshold; identifying a plurality of call types associated with the multiple intents; and applying predefined precedence rules to respond to only a single call type in the plurality of call types, the single call type associated with the single intent; and when multiple intents have confidence scores above the threshold; identifying a first intent and a second intent based on the confidence scores for the multiple intents, wherein the first intent and the second intent have highest two confidence scores in the multiple intents; and disambiguating the first intent and the second intent by presenting a disambiguation sub-dialog, via the interactive voice recognition system, wherein a user is offered a choice of which intent to process first, wherein the user is first presented with one of the first intent and the second intent having a lowest confidence score between the first intent and the second intent. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A system comprising:
-
a processor; and a computer-readable storage device having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; receiving, via an interactive voice recognition system, a user utterance and converting the user utterance to text; generating multiple intents based on the text; establishing, via the interactive voice recognition system, a confidence score for each intent in the multiple intents, wherein the confidence score for each intent is based on how much training data corresponding to the each intent was used to train a spoken language understanding module, where more training data corresponds to a higher confidence; when only a single intent in the multiple intents has a confidence score above a threshold; identifying a plurality of call types associated with the multiple intents; and applying predefined precedence rules to respond to only a single call type in the plurality of call types, the single call type associated with the single intent; and when multiple intents have confidence scores above the threshold; identifying a first intent and a second intent based on the confidence scores for in the multiple intents, wherein the first intent and the second intent have highest two confidence scores in the multiple intents; and disambiguating the first intent and the second intent by presenting a disambiguation sub-dialog, via the interactive voice recognition system, wherein a user is offered a choice of which intent to process first, wherein the user is first presented with one of the first intent and the second intent having a lowest confidence score. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification