Dialog state tracking using web-style ranking and multiple language understanding engines
First Claim
1. A method for improving dialog state tracking accuracy in a dialog system, the method comprising:
- dividing a training data set into a plurality of parts, including a first part and a second part;
training a first spoken language understanding processor with the first part of the training data set;
training a first ranking processor with the second part of the training data set and a first training parameter set;
training a second spoken language understanding processor with the second part of the training data set;
training a second ranking processor with the first part of the training data set and a second training parameter set;
determining conversational inputs from spoken utterances received from a user, wherein the conversational inputs are determined by a plurality of automatic speech recognizers, wherein different recognition models are utilized by the automatic speech recognizers to produce conversational inputs that include alternative results;
determining meaning representations from the conversational inputs, wherein the meaning representations are determined by a plurality of spoken language understanding processors including the first spoken language understanding processor and the second spoken language understanding processor, wherein each of the spoken language understanding processors is operable to provide a meaning representation based on an individual model associated with each spoken language understanding processor;
enumerating dialog state hypotheses from the meaning representations;
extracting the features associated with each dialog state hypothesis using spoken language processing, wherein the features include confidence scores associated with each dialog state hypothesis;
ranking the dialog state hypotheses according to differences in the dialog state hypotheses and the confidence scores via the first ranking processor and the second ranking processor; and
using at least one member of the ranked set of dialog states hypotheses to determine what action the dialog system should take next.
2 Assignments
0 Petitions
Accused Products
Abstract
A dialog state tracking system. One aspect of the system is the use of multiple utterance decoders and/or multiple spoken language understanding (SLU) engines generating competing results that improve the likelihood that the correct dialog state is available to the system and provide additional features for scoring dialog state hypotheses. An additional aspect is training a SLU engine and a dialog state scorer/ranker DSR engine using different subsets from a single annotated training data set. A further aspect is training multiple SLU/DSR engine pairs from inverted subsets of the annotated training data set. Another aspect is web-style dialog state ranking based on dialog state features using discriminative models with automatically generated feature conjunctions. Yet another aspect is using multiple parameter sets with each ranking engine and averaging the rankings. Each aspect independently improves dialog state tracking accuracy and may be combined in various combinations for greater improvement.
-
Citations
20 Claims
-
1. A method for improving dialog state tracking accuracy in a dialog system, the method comprising:
-
dividing a training data set into a plurality of parts, including a first part and a second part; training a first spoken language understanding processor with the first part of the training data set; training a first ranking processor with the second part of the training data set and a first training parameter set; training a second spoken language understanding processor with the second part of the training data set; training a second ranking processor with the first part of the training data set and a second training parameter set; determining conversational inputs from spoken utterances received from a user, wherein the conversational inputs are determined by a plurality of automatic speech recognizers, wherein different recognition models are utilized by the automatic speech recognizers to produce conversational inputs that include alternative results; determining meaning representations from the conversational inputs, wherein the meaning representations are determined by a plurality of spoken language understanding processors including the first spoken language understanding processor and the second spoken language understanding processor, wherein each of the spoken language understanding processors is operable to provide a meaning representation based on an individual model associated with each spoken language understanding processor; enumerating dialog state hypotheses from the meaning representations; extracting the features associated with each dialog state hypothesis using spoken language processing, wherein the features include confidence scores associated with each dialog state hypothesis; ranking the dialog state hypotheses according to differences in the dialog state hypotheses and the confidence scores via the first ranking processor and the second ranking processor; and using at least one member of the ranked set of dialog states hypotheses to determine what action the dialog system should take next. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A dialog state tracking system comprising:
-
a processing unit; and a memory including computer executable instructions which, when executed by a processing unit cause the system to provide; an input device operable to collect conversational inputs from a user; an input decoder in communication with the input device, the input decoder operable to convert the conversational inputs into computer readable text; a plurality of automatic speech recognizers determining conversational inputs from spoken utterances received from a user, wherein different recognitions models are utilized by the automatic speech recognizers to produce conversational inputs that include alternative results; a plurality of dialog state rankers, equal in number and paired with the plurality of automatic speech recognizers, where each pair is trained using a single training data set, wherein a spoken language understanding processor and a dialog state ranker in each pair are trained with different portions of the single training data set; wherein the plurality of dialog state rankers and the plurality of automatic speech recognizers include a first pair and a second pair, wherein the spoken language understanding processor of the first pair and the dialog state ranker of the second pair are trained with one portion of training data set, and the spoken language understanding processor of the second pair and the dialog state ranker of the first pair are trained with a different portion of training data set; a plurality of spoken language understanding processors in communication with the input decoder, each of the spoken language understanding processors operable to translate the computer readable text into a dialog state hypothesis based on an individual model for each spoken language understanding processor, the plurality of spoken language understanding processors outputting the features associated with each dialog state hypothesis using spoken language processing, wherein the features include confidence scores associated with each dialog state hypothesis; and a dialog manager operable to score each dialog state hypothesis based on differences in the dialog state hypotheses and the features associated with the dialog state hypothesis and select the highest scoring dialog state hypothesis as ranked via the first ranking processor and the second ranking processor as the correct dialog state. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computer readable storage device containing computer executable instructions which, when executed by a computer, enable the computer to perform a method of improving dialog state tracking accuracy in a human-computer interaction system, comprising:
-
dividing a training data set into a plurality of parts, including a first part and a second part; training a first spoken language understanding processor with the first part of the training data set; training a first ranking processor with the second part of the training data set and a first training parameter set; training a second spoken language understanding processor with the second part of the training data set; training a second ranking processor with the first part of the training data set and a second training parameter set; determining conversational inputs from spoken utterances received from a user, wherein the conversational inputs are determined by a plurality of automatic speech recognizers, wherein different recognition models are utilized by the automatic speech recognizers to produce conversational inputs that include alternative results; determining meaning representations from the conversational inputs, wherein the meaning representations are determined by a plurality of spoken language understanding processors, wherein each of the spoken language understanding processors is operable to provide a meaning representation based on an individual model associated with each spoken language understanding processor; enumerating dialog state hypotheses from the meaning representations; determining scores for each dialog state hypothesis based on dialog state hypothesis features using multiple dialog state ranking processors, each dialog state ranking processor having a forest of decision trees including automatically built conjunctions; averaging the scores from each dialog state ranking processor to produce a final score for each dialog state hypothesis; ranking the dialog state hypotheses based on the final scores via the first ranking processor and the second ranking processor; and updating a dialog session with the highest ranking dialog state hypothesis. - View Dependent Claims (17, 18, 19, 20)
-
Specification