System and method for automated testing of complicated dialog systems
First Claim
1. A method of predicting user satisfaction of a dialog system, comprising:
- defining an understanding ability measure of a set of measures, corresponding to the dialog system understanding of a user input compared to the user understanding;
defining an efficiency measure of the set of measures, corresponding to the number of dialog turns required to perform an action defined by a dialog between the user and the dialog system;
defining an action appropriateness measure of the set of measures, corresponding to an appropriateness of one or more responses of the dialog system during each dialog turn in the dialog;
applying the set of measures on a test dialog corpus generated between the dialog system and a group of human users;
assigning weights to each measure of the set of measures to generate weighted measures, wherein the weight values are based on a defined regression model which is generated by a validation of the set of measures using user satisfaction scores obtained through user satisfaction surveys for the test dialog corpus;
combining the weighted measures in a defined combinatorial equation to compute a user satisfaction score;
building a simulated user that maintains a list of goals and agenda items to complete the goals by generating a simulated dialog corpus trained from the human-user generated test dialog corpus;
applying the regression model to the simulated dialog corpus to generate an evaluation set of measures; and
using the evaluation set of measures to validate the user satisfaction score.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of an automated dialog system testing method and component are described. This automated testing method and system supplements real human-based testing with simulated user input and incorporates a set of evaluation measures that focus on three basic aspects of task-oriented dialog systems, namely, understanding ability, efficiency, and the appropriateness of system actions. These measures are first applied on a corpus generated between a dialog system and a group of human users to demonstrate the validity of these measures with the human users'"'"' satisfaction levels. Results generally show that these measures are significantly correlated with these satisfaction levels. A regression model is then built to predict the user satisfaction scores using these evaluation measures. The regression model is applied on a simulated dialog corpus trained from the above real user corpus, and show that the user satisfaction score estimated from the simulated dialogs do not significantly differ from the real users'"'"' satisfaction scores. These evaluation measures can then be used to assess the system performance based on the estimated user satisfaction.
-
Citations
10 Claims
-
1. A method of predicting user satisfaction of a dialog system, comprising:
-
defining an understanding ability measure of a set of measures, corresponding to the dialog system understanding of a user input compared to the user understanding; defining an efficiency measure of the set of measures, corresponding to the number of dialog turns required to perform an action defined by a dialog between the user and the dialog system; defining an action appropriateness measure of the set of measures, corresponding to an appropriateness of one or more responses of the dialog system during each dialog turn in the dialog; applying the set of measures on a test dialog corpus generated between the dialog system and a group of human users; assigning weights to each measure of the set of measures to generate weighted measures, wherein the weight values are based on a defined regression model which is generated by a validation of the set of measures using user satisfaction scores obtained through user satisfaction surveys for the test dialog corpus; combining the weighted measures in a defined combinatorial equation to compute a user satisfaction score; building a simulated user that maintains a list of goals and agenda items to complete the goals by generating a simulated dialog corpus trained from the human-user generated test dialog corpus; applying the regression model to the simulated dialog corpus to generate an evaluation set of measures; and using the evaluation set of measures to validate the user satisfaction score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
Specification