Dialog state tracking using web-style ranking and multiple language understanding engines

US 10,108,608 B2
Filed: 06/12/2014
Issued: 10/23/2018
Est. Priority Date: 06/12/2014
Status: Active Grant

First Claim

Patent Images

1. A method for improving dialog state tracking accuracy in a dialog system, the method comprising:

dividing a training data set into a plurality of parts, including a first part and a second part;

training a first spoken language understanding processor with the first part of the training data set;

training a first ranking processor with the second part of the training data set and a first training parameter set;

training a second spoken language understanding processor with the second part of the training data set;

training a second ranking processor with the first part of the training data set and a second training parameter set;

determining conversational inputs from spoken utterances received from a user, wherein the conversational inputs are determined by a plurality of automatic speech recognizers, wherein different recognition models are utilized by the automatic speech recognizers to produce conversational inputs that include alternative results;

determining meaning representations from the conversational inputs, wherein the meaning representations are determined by a plurality of spoken language understanding processors including the first spoken language understanding processor and the second spoken language understanding processor, wherein each of the spoken language understanding processors is operable to provide a meaning representation based on an individual model associated with each spoken language understanding processor;

enumerating dialog state hypotheses from the meaning representations;

extracting the features associated with each dialog state hypothesis using spoken language processing, wherein the features include confidence scores associated with each dialog state hypothesis;

ranking the dialog state hypotheses according to differences in the dialog state hypotheses and the confidence scores via the first ranking processor and the second ranking processor; and

using at least one member of the ranked set of dialog states hypotheses to determine what action the dialog system should take next.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A dialog state tracking system. One aspect of the system is the use of multiple utterance decoders and/or multiple spoken language understanding (SLU) engines generating competing results that improve the likelihood that the correct dialog state is available to the system and provide additional features for scoring dialog state hypotheses. An additional aspect is training a SLU engine and a dialog state scorer/ranker DSR engine using different subsets from a single annotated training data set. A further aspect is training multiple SLU/DSR engine pairs from inverted subsets of the annotated training data set. Another aspect is web-style dialog state ranking based on dialog state features using discriminative models with automatically generated feature conjunctions. Yet another aspect is using multiple parameter sets with each ranking engine and averaging the rankings. Each aspect independently improves dialog state tracking accuracy and may be combined in various combinations for greater improvement.

Citations

20 Claims

1. A method for improving dialog state tracking accuracy in a dialog system, the method comprising:
- dividing a training data set into a plurality of parts, including a first part and a second part;
  
  training a first spoken language understanding processor with the first part of the training data set;
  
  training a first ranking processor with the second part of the training data set and a first training parameter set;
  
  training a second spoken language understanding processor with the second part of the training data set;
  
  training a second ranking processor with the first part of the training data set and a second training parameter set;
  
  determining conversational inputs from spoken utterances received from a user, wherein the conversational inputs are determined by a plurality of automatic speech recognizers, wherein different recognition models are utilized by the automatic speech recognizers to produce conversational inputs that include alternative results;
  
  determining meaning representations from the conversational inputs, wherein the meaning representations are determined by a plurality of spoken language understanding processors including the first spoken language understanding processor and the second spoken language understanding processor, wherein each of the spoken language understanding processors is operable to provide a meaning representation based on an individual model associated with each spoken language understanding processor;
  
  enumerating dialog state hypotheses from the meaning representations;
  
  extracting the features associated with each dialog state hypothesis using spoken language processing, wherein the features include confidence scores associated with each dialog state hypothesis;
  
  ranking the dialog state hypotheses according to differences in the dialog state hypotheses and the confidence scores via the first ranking processor and the second ranking processor; and
  
  using at least one member of the ranked set of dialog states hypotheses to determine what action the dialog system should take next.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 characterized in that the act of ranking the dialog state hypotheses further comprises the acts of:
    - determining component scores for each dialog state hypothesis based on features associated with the dialog state hypothesis using a forest of decision trees;
      
      computing a final score for each dialog state hypothesis from a weighted sum of the component scores; and
      
      ranking the dialog state hypotheses based on the final score.
  - 3. The method of claim 1 characterized in that the act of ranking the dialog state hypotheses further comprises the acts of:
    - enumerating a plurality of scores for each dialog state hypothesis using a plurality of dialog state ranking processors;
      
      averaging the scores for each dialog hypothesis to produce a final score for each dialog hypothesis;
      
      ranking the dialog state hypotheses based on the final score.
  - 4. The method of claim 1 characterized in that the decision trees include cascading binary branches for leading to leaf nodes, each leaf node having a real value that is added to the score for a dialog state hypothesis when decisions of the binary branches based on features associated with the dialog state hypothesis lead to that leaf node.
  - 5. The method of claim 4 characterized in that each binary branch applies a threshold to feature scoring features associated with the dialog state hypothesis.
  - 6. The method of claim 1 further comprising the act of using a web-style ranking algorithm to automatically build ranking models having conjunctions.
  - 7. The method of claim 6 characterized in that the web-style ranking algorithm is lambdaMart.
  - 8. The method of claim 1, further comprising the acts of:
    - training a third ranking processor with the second part of the training data set and a third training parameter set, the third training parameter set being different than the first training parameter set; and
      
      training a fourth ranking processor with the first part of the training data set and a fourth training parameter set, the fourth training parameter set being different than the second training parameter set.
  - 9. The method of claim 1 characterized in that the features associated with each dialog state hypothesis include additional features derived from competitive outputs obtained by processing dialog state hypotheses using a plurality of spoken language processing processors.
  - 10. The method of claim 1 characterized in that the conversational inputs are spoken utterances decoded using automatic speech recognition.

11. A dialog state tracking system comprising:
- a processing unit; and
  
  a memory including computer executable instructions which, when executed by a processing unit cause the system to provide;
  
  an input device operable to collect conversational inputs from a user;
  
  an input decoder in communication with the input device, the input decoder operable to convert the conversational inputs into computer readable text;
  
  a plurality of automatic speech recognizers determining conversational inputs from spoken utterances received from a user, wherein different recognitions models are utilized by the automatic speech recognizers to produce conversational inputs that include alternative results;
  
  a plurality of dialog state rankers, equal in number and paired with the plurality of automatic speech recognizers, where each pair is trained using a single training data set, wherein a spoken language understanding processor and a dialog state ranker in each pair are trained with different portions of the single training data set;
  
  wherein the plurality of dialog state rankers and the plurality of automatic speech recognizers include a first pair and a second pair, wherein the spoken language understanding processor of the first pair and the dialog state ranker of the second pair are trained with one portion of training data set, and the spoken language understanding processor of the second pair and the dialog state ranker of the first pair are trained with a different portion of training data set;
  
  a plurality of spoken language understanding processors in communication with the input decoder, each of the spoken language understanding processors operable to translate the computer readable text into a dialog state hypothesis based on an individual model for each spoken language understanding processor, the plurality of spoken language understanding processors outputting the features associated with each dialog state hypothesis using spoken language processing, wherein the features include confidence scores associated with each dialog state hypothesis; and
  
  a dialog manager operable to score each dialog state hypothesis based on differences in the dialog state hypotheses and the features associated with the dialog state hypothesis and select the highest scoring dialog state hypothesis as ranked via the first ranking processor and the second ranking processor as the correct dialog state.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The dialog state tracking system of claim 11 characterized in that the dialog manager uses web-style ranking to score each dialog state hypothesis.
  - 13. The dialog state tracking system of claim 11 characterized in that dialog manager includes a plurality of dialog state rankers operable to score each dialog state hypothesis using a forest of decision trees including automatically built conjunctions.
  - 14. The dialog state tracking system of claim 11 characterized in that the conversational inputs are spoken utterances and the input decoder includes an automatic speech recognizer operable to decode the spoken utterance into computer readable text.
  - 15. The dialog state tracking system of claim 11 characterized in that the input decoder comprises a plurality of automatic speech recognizers, each automatic speech recognizer in communication with one or more of the spoken language understanding processors.

16. A computer readable storage device containing computer executable instructions which, when executed by a computer, enable the computer to perform a method of improving dialog state tracking accuracy in a human-computer interaction system, comprising:
- dividing a training data set into a plurality of parts, including a first part and a second part;
  
  training a first spoken language understanding processor with the first part of the training data set;
  
  training a first ranking processor with the second part of the training data set and a first training parameter set;
  
  training a second spoken language understanding processor with the second part of the training data set;
  
  training a second ranking processor with the first part of the training data set and a second training parameter set;
  
  determining conversational inputs from spoken utterances received from a user, wherein the conversational inputs are determined by a plurality of automatic speech recognizers, wherein different recognition models are utilized by the automatic speech recognizers to produce conversational inputs that include alternative results;
  
  determining meaning representations from the conversational inputs, wherein the meaning representations are determined by a plurality of spoken language understanding processors, wherein each of the spoken language understanding processors is operable to provide a meaning representation based on an individual model associated with each spoken language understanding processor;
  
  enumerating dialog state hypotheses from the meaning representations;
  
  determining scores for each dialog state hypothesis based on dialog state hypothesis features using multiple dialog state ranking processors, each dialog state ranking processor having a forest of decision trees including automatically built conjunctions;
  
  averaging the scores from each dialog state ranking processor to produce a final score for each dialog state hypothesis;
  
  ranking the dialog state hypotheses based on the final scores via the first ranking processor and the second ranking processor; and
  
  updating a dialog session with the highest ranking dialog state hypothesis.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer readable storage device of claim 16, wherein the computer is further enabled by the instructions for:
    - training a third ranking processor with the second part of the training data set and a third training parameter set, the third training parameter set being different than the first training parameter set; and
      
      training a fourth ranking processor with the first part of the training data set and a fourth training parameter set, the fourth training parameter set being different than the second training parameter set.
  - 18. The computer readable storage device of claim 16, wherein the computer uses a web-style ranking algorithm to automatically build ranking models having conjunctions.
  - 19. The computer readable storage device of claim 18, wherein the web-style ranking algorithm is lambdaMART.
  - 20. The computer readable storage device of claim 16, wherein the first part of the training data set and the second part of the training data set divide the training data set into substantially equal parts.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Williams, Jason D., Zweig, Geoffrey G.
Primary Examiner(s)
Ky, Kevin

Application Number

US14/303,395
Publication Number

US 20150363393A1
Time in Patent Office

1,594 Days
Field of Search

704 8- 9
US Class Current
CPC Class Codes

G06F 40/58   Use of machine translation,...

G10L 15/063   Training

G10L 15/22   Procedures used during a sp...

G10L 2015/0635   updating or merging of old ...

Dialog state tracking using web-style ranking and multiple language understanding engines

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Dialog state tracking using web-style ranking and multiple language understanding engines

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links