Goal segmentation in speech dialogs
First Claim
1. A method comprising:
- processing a first dialog turn comprising a user utterance, wherein processing the first dialog turn comprises;
performing automated speech recognition (ASR) on the user utterance to produce an ASR hypothesis corresponding to the first dialog turn; and
performing natural language understanding (NLU) on the ASR hypothesis to produce an NLU hypothesis corresponding to the first dialog turn;
determining a first feature set corresponding to the first dialog turn, wherein the first feature set corresponding to the first dialog turn indicates;
a time relationship between the first dialog turn and a second dialog turn;
a confidence level of the ASR hypothesis corresponding to the first dialog turn; and
a confidence level of the NLU hypothesis corresponding to the first dialog turn;
determining a second feature set corresponding to the second dialog turn; and
receiving a first goal boundary identifier, wherein the first goal boundary identifier delineates a first goal segment comprising a sequence of one or more dialog turns that relate to a corresponding goal of the user utterance;
analyzing one or more dependencies between the goal boundary identifier and the first feature set;
generating a predictive model based at least in part on the analyzing, wherein the predictive model is responsive to a third feature set corresponding to a third dialog turn to delineate a second goal segment;
determining, based at least in part on the predictive model and the third feature set, the second goal segment;
associating, with the second goal segment, a start label that indicates a beginning of the second goal segment; and
associating, with the second goal segment, an end label that indicates an end of the second goal segment.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech-based system is configured to interact with a user through speech to determine intents and goals of the user. The system may analyze multiple dialog turns in order to determine and fully define a goal that the user is trying to express. Each dialog turn comprises a user utterance. Each dialog turn may also comprise a system speech response. In order to evaluate the performance of the system, logged data is analyzed to identify goal segments within the logged data, where a goal segment is a sequence of dialog turns that relate to a corresponding user goal. A subset of the dialog turns is annotated manually to delineate goal segments. A predictive model is then constructed based on the manually annotated goal segments. The predictive model is then used to identify goal segments formed by additional dialog turns.
51 Citations
20 Claims
-
1. A method comprising:
-
processing a first dialog turn comprising a user utterance, wherein processing the first dialog turn comprises; performing automated speech recognition (ASR) on the user utterance to produce an ASR hypothesis corresponding to the first dialog turn; and performing natural language understanding (NLU) on the ASR hypothesis to produce an NLU hypothesis corresponding to the first dialog turn; determining a first feature set corresponding to the first dialog turn, wherein the first feature set corresponding to the first dialog turn indicates; a time relationship between the first dialog turn and a second dialog turn; a confidence level of the ASR hypothesis corresponding to the first dialog turn; and a confidence level of the NLU hypothesis corresponding to the first dialog turn; determining a second feature set corresponding to the second dialog turn; and receiving a first goal boundary identifier, wherein the first goal boundary identifier delineates a first goal segment comprising a sequence of one or more dialog turns that relate to a corresponding goal of the user utterance; analyzing one or more dependencies between the goal boundary identifier and the first feature set; generating a predictive model based at least in part on the analyzing, wherein the predictive model is responsive to a third feature set corresponding to a third dialog turn to delineate a second goal segment; determining, based at least in part on the predictive model and the third feature set, the second goal segment; associating, with the second goal segment, a start label that indicates a beginning of the second goal segment; and associating, with the second goal segment, an end label that indicates an end of the second goal segment. - View Dependent Claims (2, 3)
-
-
4. A system comprising:
-
one or more processors; and one or more non-transitory computer-readable media storing computer-executable instructions that, when executed on the one or more processors, cause the one or more processors to perform actions comprising; determining a first feature set corresponding to a first dialog turn that is stored in a set of logged data, the first dialog turn corresponding to a first user utterance; receiving a first goal boundary identifier indicating that the first dialog turn represents a boundary of a first goal segment, the first goal segment comprising a first sequence of one or more dialog turns that relate to a first user goal; generating a predictive model based at least in part on the first feature set and the first goal boundary identifier; determining a second feature set of a second dialog turn corresponding to a second user utterance; analyzing the second feature set using the predictive model; determining, based at least in part on the analyzing, that the second dialog turn represents a boundary of a second goal segment, the second goal segment comprising a second sequence of one or more dialog turns that relate to a second user goal; and storing a second goal boundary identifier indicating that the second dialog turn represents the boundary of the second goal segment. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method comprising:
-
building a predictive model based at least in part on training data comprising at least (a) a first feature set corresponding to a first dialog turn and (b) a first goal boundary identifier indicating that the first dialog turn represents a boundary of a first goal segment, the first goal segment comprising a first sequence of one or more dialog turns that relate to a first user goal; receiving a second feature set corresponding to a second dialog turn; analyzing the second feature set using the predictive model; and determining, based at least in part on the analyzing, that the second dialog turn represents a boundary of a second goal segment comprising a sequence of one or more dialog turns that relate to a corresponding second user goal. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A system comprising:
-
one or more processors; and one or more non-transitory computer-readable media storing computer-executable instructions that, when executed on the one or more processors, cause the one or more processors to perform actions comprising; generating a predictive model using a first feature set of a first dialog turn and an indication that the first dialog turn represents a first boundary of a first goal segment; determining a second feature set of a second dialog turn corresponding to a user utterance; analyzing the second feature set using the predictive model; determining, based at least in part on the analyzing, that the second dialog turn represents a second boundary of a second goal segment, the second goal segment comprising a sequence of one or more dialog turns that relate to a user goal; and storing an indication that the second dialog turn represents the second boundary of the second goal segment.
-
Specification