Dialog agent for conducting task-oriented computer-based communications

US 10,387,463 B2
Filed: 07/06/2017
Issued: 08/20/2019
Est. Priority Date: 07/06/2017
Status: Active Grant

First Claim

Patent Images

1. A system for implementing multi-turn dialogs, wherein the system is configured to perform a method comprising:

receiving, by a dialog handler of the system, a series of user utterances;

generating, by the dialog handler, based at least in part on a predetermined dialog management policy and on information retrieved from multiple tables of a database for a domain, a series of responsive system utterances;

wherein each system utterance of the series of responsive system utterances is responsive to a different user utterance of the series of user utterances;

wherein a dialog comprises a number of dialog turns, wherein each dialog turn comprises a respective pair of user and responsive system utterances; and

labeling, by the dialog handler, the series of responsive system utterances to generate training data for training a subsequent dialog management policy;

wherein labeling the series of responsive system utterances includes executing a reward function at each turn of the dialog;

wherein for each turn of the dialog the reward function is configured to output a reward value that is based at least in part on an accuracy of the responsive system utterance of the turn and on the number of dialog turns elapsed;

wherein the reward function incorporates a dialog complexity estimator that is configured to calculate a query cost associated with querying the database;

wherein the dialog complexity estimator is configured to return a value of zero, one, or between zero and one; and

wherein the reward function incorporates the dialog complexity estimator by at least augmenting a state space of the predetermined dialog management policy to include an additional dimension that corresponds to the returned value.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments of the present invention provide a system for implementing multi-turn dialogs. The system performs a method that includes receiving a series of user utterances, generating a series of responsive system utterances, and labeling the series of responsive system utterances to generate training data for training a dialog management policy. The labeling includes executing a reward function at each turn of a dialog, in which for each turn of the dialog the reward function is configured to output a reward value that is based at least in part on an accuracy of the responsive system utterance of the turn and on number of dialog turns elapsed.

Citations

13 Claims

1. A system for implementing multi-turn dialogs, wherein the system is configured to perform a method comprising:
- receiving, by a dialog handler of the system, a series of user utterances;
  
  generating, by the dialog handler, based at least in part on a predetermined dialog management policy and on information retrieved from multiple tables of a database for a domain, a series of responsive system utterances;
  
  wherein each system utterance of the series of responsive system utterances is responsive to a different user utterance of the series of user utterances;
  
  wherein a dialog comprises a number of dialog turns, wherein each dialog turn comprises a respective pair of user and responsive system utterances; and
  
  labeling, by the dialog handler, the series of responsive system utterances to generate training data for training a subsequent dialog management policy;
  
  wherein labeling the series of responsive system utterances includes executing a reward function at each turn of the dialog;
  
  wherein for each turn of the dialog the reward function is configured to output a reward value that is based at least in part on an accuracy of the responsive system utterance of the turn and on the number of dialog turns elapsed;
  
  wherein the reward function incorporates a dialog complexity estimator that is configured to calculate a query cost associated with querying the database;
  
  wherein the dialog complexity estimator is configured to return a value of zero, one, or between zero and one; and
  
  wherein the reward function incorporates the dialog complexity estimator by at least augmenting a state space of the predetermined dialog management policy to include an additional dimension that corresponds to the returned value.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The system of claim 1, wherein the predetermined dialog management policy is defined over the augmented state space.
  - 3. The system of claim 1, wherein:
    - the reward function incorporates the dialog complexity estimator further by at least;
      
      modifying the reward function by specifying the reward function over the augmented state space and subtracting the output of the reward function by a function of the return value of the dialog complexity estimator.
  - 4. The system of claim 3, wherein generating the series of responsive system utterances includes:
    - for each given user utterance in the series of user utterances;
      
      executing, by the dialog handler, a feature extractor configured to extract a feature vector from the given user utterance;
      
      executing, by the dialog handler, a belief tracker configured to receive the feature vector as an input, concatenate the feature vector with encoded dialog history, and output a probability distribution vector over columns of the multiple tables of the database, wherein the output is based at least in part on the received feature vector; and
      
      executing, by the dialog handler, a dialog manager that is configured to receive the output probability distribution vector, select an action from an action space based at least in part on the predetermined dialog management policy, and provide the selected action to a dialog generator that is configured to generate a natural language response based at least in part on the selected action and the information retrieved from the database.
  - 5. The system of claim 4, wherein, based at least in part on the returned value of the dialog complexity estimator being greater than a threshold, the system does not query the database but does generate a system utterance that generates a clarification question, and wherein based at least in part on the returned value of the dialog complexity estimator being less than or equal to the threshold, the system queries the database and generates a responsive system utterance based at least in part on a result of the query.
  - 6. The system of claim 1, wherein the method further includes terminating the dialog upon a predetermined upper turn limit being reached.

7. A computer-implemented method for implementing multi-turn dialogs via retrieval of information from multiple tables of a database comprising:
- receiving as inputs, by a dialog handler of a computing system, an upper limit of dialog turns and training data comprising a series of user utterances and a series of responsive system utterances;
  
  wherein each system utterance of the series of responsive system utterances is responsive to a different user utterance of the series of user utterances;
  
  wherein a dialog comprises a number of turns, wherein each turn comprises a respective pair of user and responsive system utterances; and
  
  training, by the dialog handler, a dialog management policy of a dialog manager based at least in part on the received inputs and on a reward function measured at each turn of the dialog;
  
  wherein for each turn of the dialog the reward function is configured to output a reward value that is based at least in part on an accuracy of the responsive system utterance of the turn and on the number of dialog turns elapsed;
  
  wherein the reward function incorporates a dialog complexity estimator that is configured to calculate a query cost associated with querying the database;
  
  wherein the dialog complexity estimator is configured to return a value of zero, one, or between zero and one; and
  
  wherein the reward function incorporates the dialog complexity estimator by at least augmenting a state space of the predetermined dialog management policy to include an additional dimension that corresponds to the returned value.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The computer-implemented method of claim 7, wherein:
    - training the dialog management policy of the dialog manager includes providing the training data and the upper limit of dialog turns to a supervised learning neural network; and
      
      the training data includes labeled dialog data and the reward values of the series of responsive system utterances.
  - 9. The computer-implemented method of claim 7, wherein:
    - training the dialog management policy of the dialog manager includes providing the training data and the upper limit of dialog turns to a reinforcement learning neural network; and
      
      the training data includes unlabeled dialog data and the reward function.
  - 10. The computer-implemented method of claim 7, wherein the dialog management policy is defined over the augmented state space.
  - 11. The computer-implemented method of claim 7, wherein:
    - the reward function incorporates the dialog complexity estimator further by at least;
      
      modifying the reward function by specifying the reward function over the augmented state space and subtracting the output of the reward function by a function of the return value of the dialog complexity estimator.
  - 12. The computer-implemented method of claim 7 further comprising training a belief tracker based at least in part on the received inputs and on the reward function.

13. A computer program product implementing multi-turn dialogs, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a system operatively coupled to one or more processors to cause the system to perform a method comprising:
- receiving, by a dialog handler of the system, a series of user utterances; and
  
  generating, by the dialog handler, based at least in part on a trained dialog management policy and information retrieved from multiple tables of a database, a series of responsive system utterances;
  
  wherein each system utterance of the series of responsive system utterances is responsive to a different user utterance of the series of user utterances;
  
  wherein a dialog comprises a number of turns, wherein each turn comprises a respective pair of user and responsive system utterances;
  
  wherein the trained dialog management policy was trained based at least in part on executing a reward function at each given turn of a prior dialog;
  
  wherein for each given turn of the prior dialog the reward function is configured to output a reward value that is based at least in part on an accuracy of a responsive system utterance of the given turn and on the number of dialog turns elapsed;
  
  wherein the reward function incorporates a dialog complexity estimator that is configured to calculate a query cost associated with querying the database;
  
  wherein the dialog complexity estimator is configured to return a value of zero, one, or between zero and one; and
  
  wherein the reward function incorporates the dialog complexity estimator by at least augmenting a state space of the predetermined dialog management policy to include an additional dimension that corresponds to the returned value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Campbell, Murray S., Liu, Miao, Srivastava, Biplav
Primary Examiner(s)
Blankenagel, Bryan S

Application Number

US15/643,049
Publication Number

US 20190012371A1
Time in Patent Office

775 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/243   Natural language query form...

G06F 16/3329   Natural language query form...

G06F 16/90332   Natural language query form...

G06F 3/167   Audio in a user interface, ...

G06F 40/30   Semantic analysis

G06F 40/35   Discourse or dialogue repre...

G10L 15/063   Training

G10L 15/1822   Parsing for meaning underst...

G10L 15/22   Procedures used during a sp...

Dialog agent for conducting task-oriented computer-based communications

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Dialog agent for conducting task-oriented computer-based communications

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links