Data collection for a new conversational dialogue system
First Claim
1. A method for training an annotated dialogue system, comprising:
- generating, by a data generation application executing on a machine, a first list of canonical utterances in a dialogue tree of possible dialogues, the first list of canonical utterances generated in response to an input received from an annotating user;
receiving from the annotating user a first selection of a first canonical utterance from the first list, the selection indicating a first step in a multi-step dialogue;
generating a second list of canonical utterances, the second list of canonical utterances generated in response to the first selection from the first list;
receiving from the user a second selection of a second canonical utterance from the second list, the second selection indicating a second step in the multi-step dialogue that also includes the first step;
presenting a user interface indicating a dialogue path including both the first step and the second step;
receiving from the annotating user via the user interface a compound paraphrase for the dialogue path including both the first step and the second step; and
training the annotated dialogue system on annotated data including the compound paraphrase, the first step, and the second step.
2 Assignments
0 Petitions
Accused Products
Abstract
A data collection system is based on a general set of dialogue acts which are derived from a database schema. Crowd workers perform two types of tasks: (i) identification of sensical dialogue paths and (ii) performing context-dependent paraphrasing of these dialogue paths into real dialogues. The end output of the system is a set of training examples of real dialogues which have been annotated with their logical forms. This data can be used to train all three components of the dialogue system: (i) the semantic parser for understanding context-dependent utterances, (ii) the dialogue policy for generating new dialogue acts given the current state, and (iii) the generation system for both deciding what to say and how to render it in natural language.
87 Citations
20 Claims
-
1. A method for training an annotated dialogue system, comprising:
-
generating, by a data generation application executing on a machine, a first list of canonical utterances in a dialogue tree of possible dialogues, the first list of canonical utterances generated in response to an input received from an annotating user; receiving from the annotating user a first selection of a first canonical utterance from the first list, the selection indicating a first step in a multi-step dialogue; generating a second list of canonical utterances, the second list of canonical utterances generated in response to the first selection from the first list; receiving from the user a second selection of a second canonical utterance from the second list, the second selection indicating a second step in the multi-step dialogue that also includes the first step; presenting a user interface indicating a dialogue path including both the first step and the second step; receiving from the annotating user via the user interface a compound paraphrase for the dialogue path including both the first step and the second step; and training the annotated dialogue system on annotated data including the compound paraphrase, the first step, and the second step. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for training an annotated dialogue system, comprising:
-
a processor; memory; one or more modules stored in memory and executable by the processor to; generate, by a data generation application, a first list of canonical utterances in a dialogue tree of possible dialogues, the first list of canonical utterances generated in response to an input received from an annotating user, receive from the annotating user a first selection of a first canonical utterance from the first list, the first selection indicating a first step in a multi-step dialogue, generate a second list of canonical utterances, the second list of canonical utterances generated in response to the first selection from the first list, receive from the annotating user a second selection of a second canonical utterance from the second list, the second selection indicating a second step in the multi-step dialogue that also includes the first step, present a user interface indicating a dialogue path including both the first step and the second step, receive from the annotating user via the user interface a compound paraphrase for the dialogue path including both the first step and the second step, and train the annotated dialogue system on annotated data including the compound paraphrase, the first step, and the second step. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A computer system including a processor and memory holding instructions executable by the processor to perform a method for training an annotated dialogue system, the method comprising:
-
generating a first list of canonical utterances in a dialogue tree of possible dialogues, the first list of canonical utterances generated in response to an input received from an annotating user; receiving from the annotating user a first selection of a first canonical utterance from the first list, the first selection indicating a first step in a multi-step dialogue; generating a second list of canonical utterances, the second list of canonical utterances generated in response to the first selection from the first list; receiving from the annotating user a second selection of a second canonical utterance from the second list, the second selection indicating a second step in the multi-step dialogue that also includes the first step; presenting a user interface indicating a dialogue path including both the first step and the second step and an input box configured for receiving a compound paraphrase for the dialogue path including both the first step and the second step; receiving from the annotating user via the input box of the user interface a compound paraphrase for the dialogue path including both the first step and the second step; and training the annotated dialogue system on annotated data including the compound paraphrase, the first step, and the second step.
-
Specification