Dialog agent for conducting task-oriented computer-based communications
First Claim
1. A system for implementing multi-turn dialogs, wherein the system is configured to perform a method comprising:
- receiving, by a dialog handler of the system, a series of user utterances;
generating, by the dialog handler, based at least in part on a predetermined dialog management policy and on information retrieved from multiple tables of a database for a domain, a series of responsive system utterances;
wherein each system utterance of the series of responsive system utterances is responsive to a different user utterance of the series of user utterances;
wherein a dialog comprises a number of dialog turns, wherein each dialog turn comprises a respective pair of user and responsive system utterances; and
labeling, by the dialog handler, the series of responsive system utterances to generate training data for training a subsequent dialog management policy;
wherein labeling the series of responsive system utterances includes executing a reward function at each turn of the dialog;
wherein for each turn of the dialog the reward function is configured to output a reward value that is based at least in part on an accuracy of the responsive system utterance of the turn and on the number of dialog turns elapsed;
wherein the reward function incorporates a dialog complexity estimator that is configured to calculate a query cost associated with querying the database;
wherein the dialog complexity estimator is configured to return a value of zero, one, or between zero and one; and
wherein the reward function incorporates the dialog complexity estimator by at least augmenting a state space of the predetermined dialog management policy to include an additional dimension that corresponds to the returned value.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the present invention provide a system for implementing multi-turn dialogs. The system performs a method that includes receiving a series of user utterances, generating a series of responsive system utterances, and labeling the series of responsive system utterances to generate training data for training a dialog management policy. The labeling includes executing a reward function at each turn of a dialog, in which for each turn of the dialog the reward function is configured to output a reward value that is based at least in part on an accuracy of the responsive system utterance of the turn and on number of dialog turns elapsed.
-
Citations
13 Claims
-
1. A system for implementing multi-turn dialogs, wherein the system is configured to perform a method comprising:
-
receiving, by a dialog handler of the system, a series of user utterances; generating, by the dialog handler, based at least in part on a predetermined dialog management policy and on information retrieved from multiple tables of a database for a domain, a series of responsive system utterances; wherein each system utterance of the series of responsive system utterances is responsive to a different user utterance of the series of user utterances; wherein a dialog comprises a number of dialog turns, wherein each dialog turn comprises a respective pair of user and responsive system utterances; and labeling, by the dialog handler, the series of responsive system utterances to generate training data for training a subsequent dialog management policy; wherein labeling the series of responsive system utterances includes executing a reward function at each turn of the dialog; wherein for each turn of the dialog the reward function is configured to output a reward value that is based at least in part on an accuracy of the responsive system utterance of the turn and on the number of dialog turns elapsed; wherein the reward function incorporates a dialog complexity estimator that is configured to calculate a query cost associated with querying the database; wherein the dialog complexity estimator is configured to return a value of zero, one, or between zero and one; and wherein the reward function incorporates the dialog complexity estimator by at least augmenting a state space of the predetermined dialog management policy to include an additional dimension that corresponds to the returned value. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented method for implementing multi-turn dialogs via retrieval of information from multiple tables of a database comprising:
-
receiving as inputs, by a dialog handler of a computing system, an upper limit of dialog turns and training data comprising a series of user utterances and a series of responsive system utterances; wherein each system utterance of the series of responsive system utterances is responsive to a different user utterance of the series of user utterances; wherein a dialog comprises a number of turns, wherein each turn comprises a respective pair of user and responsive system utterances; and training, by the dialog handler, a dialog management policy of a dialog manager based at least in part on the received inputs and on a reward function measured at each turn of the dialog; wherein for each turn of the dialog the reward function is configured to output a reward value that is based at least in part on an accuracy of the responsive system utterance of the turn and on the number of dialog turns elapsed; wherein the reward function incorporates a dialog complexity estimator that is configured to calculate a query cost associated with querying the database; wherein the dialog complexity estimator is configured to return a value of zero, one, or between zero and one; and wherein the reward function incorporates the dialog complexity estimator by at least augmenting a state space of the predetermined dialog management policy to include an additional dimension that corresponds to the returned value. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer program product implementing multi-turn dialogs, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a system operatively coupled to one or more processors to cause the system to perform a method comprising:
-
receiving, by a dialog handler of the system, a series of user utterances; and generating, by the dialog handler, based at least in part on a trained dialog management policy and information retrieved from multiple tables of a database, a series of responsive system utterances; wherein each system utterance of the series of responsive system utterances is responsive to a different user utterance of the series of user utterances; wherein a dialog comprises a number of turns, wherein each turn comprises a respective pair of user and responsive system utterances; wherein the trained dialog management policy was trained based at least in part on executing a reward function at each given turn of a prior dialog; wherein for each given turn of the prior dialog the reward function is configured to output a reward value that is based at least in part on an accuracy of a responsive system utterance of the given turn and on the number of dialog turns elapsed; wherein the reward function incorporates a dialog complexity estimator that is configured to calculate a query cost associated with querying the database; wherein the dialog complexity estimator is configured to return a value of zero, one, or between zero and one; and wherein the reward function incorporates the dialog complexity estimator by at least augmenting a state space of the predetermined dialog management policy to include an additional dimension that corresponds to the returned value.
-
Specification