Dialogue management using scripts and combined confidence scores
First Claim
1. A method of determining a dialogue move in a multi-device environment, comprising:
- receiving an input utterance from a speaker through an input component, the input utterance directed to a plurality of devices in the multi-device environment, wherein each device of the plurality of devices is associated with a respective activity model encapsulating device-specific information for the respective device;
generating an input pattern from the input utterance that includes a structured description of a dialogue contribution by the speaker, the structured description including one of syntactic, semantic and phonological information;
performing shallow processing of the input utterance through an activity tree functionally coupled to the activity model for each device of the plurality of devices to produce a plurality of candidate dialogue moves, wherein the activity tree manages activities of the devices relevant to the input utterance;
identifying the description and at least one parameter of the description using a dialogue move script (DMS), wherein the DMS is used in identifying the description and at least one parameter of the description and corresponds to at least one device of the plurality of devices, and wherein the dialogue move is independent of the device;
mapping the description to a dialogue move of the candidate dialogue moves using the DMS, the dialogue move corresponding to the identified parameter;
receiving a confidence score from a speech recognizer component coupled to the input component, the confidence score quantifying the probability that the speech recognizer component can recognize the input utterance, and to produce an n-best list of dialogue moves;
translating the confidence score into a qualitative description of the likelihood of proper recognition;
incorporating into the dialogue move script a keyword allowing the formulation by a dialog manager component of a confirmation question in response to the input utterance; and
formulating one of a confirmation question if the confidence score is above a defined threshold value or a help message if the confidence score is equal to or below the defined threshold value;
combining weighted confidence scores for the plurality of devices into a combined confidence score to select an appropriate device of the plurality of devices and re-ordering the n-best list based on the combined confidence score to rate one or more of the dialogue move candidates as the interpretation of the utterance.
1 Assignment
0 Petitions
Accused Products
Abstract
Representation-neutral dialogue systems and methods (“RNDS”) are described that include multi-application, multi-device spoken-language dialogue systems based on the information-state update approach. The RNDS includes representation-neutral core components of a dialogue system that provide scripted domain-specific extensions to routines such as dialogue move modeling and reference resolution, easy substitution of specific semantic representations and associated routines, and clean interfaces to external components for language-understanding (i.e., speech-recognition and parsing) and language-generation, and to domain-specific knowledge sources. The RNDS also resolves multi-device dialogue by evaluating and selecting among candidate dialogue moves based on features at multiple levels. Multiple sources of information are combined, multiple speech recognition and parsing hypotheses tested, and multiple device and moves considered to choose the highest scoring hypothesis overall. Confirmation and clarification behavior can be governed by the overall score.
-
Citations
18 Claims
-
1. A method of determining a dialogue move in a multi-device environment, comprising:
-
receiving an input utterance from a speaker through an input component, the input utterance directed to a plurality of devices in the multi-device environment, wherein each device of the plurality of devices is associated with a respective activity model encapsulating device-specific information for the respective device; generating an input pattern from the input utterance that includes a structured description of a dialogue contribution by the speaker, the structured description including one of syntactic, semantic and phonological information; performing shallow processing of the input utterance through an activity tree functionally coupled to the activity model for each device of the plurality of devices to produce a plurality of candidate dialogue moves, wherein the activity tree manages activities of the devices relevant to the input utterance; identifying the description and at least one parameter of the description using a dialogue move script (DMS), wherein the DMS is used in identifying the description and at least one parameter of the description and corresponds to at least one device of the plurality of devices, and wherein the dialogue move is independent of the device; mapping the description to a dialogue move of the candidate dialogue moves using the DMS, the dialogue move corresponding to the identified parameter; receiving a confidence score from a speech recognizer component coupled to the input component, the confidence score quantifying the probability that the speech recognizer component can recognize the input utterance, and to produce an n-best list of dialogue moves; translating the confidence score into a qualitative description of the likelihood of proper recognition; incorporating into the dialogue move script a keyword allowing the formulation by a dialog manager component of a confirmation question in response to the input utterance; and formulating one of a confirmation question if the confidence score is above a defined threshold value or a help message if the confidence score is equal to or below the defined threshold value; combining weighted confidence scores for the plurality of devices into a combined confidence score to select an appropriate device of the plurality of devices and re-ordering the n-best list based on the combined confidence score to rate one or more of the dialogue move candidates as the interpretation of the utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method of determining a dialogue move in a multi-device environment, comprising:
-
receiving an input utterance from a speaker through an input component, the input utterance directed to a plurality of devices in the multi-device environment, wherein each device of the plurality of devices is associated with a respective activity model encapsulating device-specific information for the respective device; generating an input pattern from the input utterance that includes a structured description of a dialogue contribution by the speaker, the structured description including one of syntactic, semantic and phonological information; performing shallow processing of the input utterance through an activity tree functionally coupled to the activity model for each device of the plurality of devices to produce a plurality of candidate dialogue moves, wherein the activity tree manages activities of the devices relevant to the input utterance; identifying the description and at least one parameter of the description using a dialogue move script (DMS), wherein the DMS is used in identifying the description and at least one parameter of the description and corresponds to at least one device of the plurality of devices, and wherein the dialogue move is independent of the device and application; mapping the description to a dialogue move of the candidate dialogue moves using the DMS, the dialogue move corresponding to the identified parameter; receiving a confidence score from a speech recognizer component coupled to the input component, the confidence score quantifying the probability that the speech recognizer component can recognize the input utterance, and to produce an n-best list of dialogue moves; translating the confidence score into a qualitative description of the likelihood of proper recognition; incorporating into the dialogue move script a keyword allowing the formulation by a dialog manager component of a confirmation question in response to the input utterance; and formulating one of a confirmation question if the confidence score is above a defined threshold value or a help message if the confidence score is equal to or below the defined threshold value; receiving a confidence score for each of a plurality of features of the input utterance for each device of the plurality of devices, wherein the confidence score comprises a numerical value representing a probability of proper recognition of the input utterance, and wherein the features are selected from the group consisting of;
confidence scores from a speech recognizer component, confidence scores from a parser coupled to the speech recognizer, semantic criteria, pragmatic criteria, and dialogue context;assigning a weight to each feature to generate a weighted confidence score for each device based on the shallow processing by each device; combining the weighted confidence scores for the plurality of devices into a combined confidence score to select an appropriate device of the plurality of devices and re-ordering the n-best list of dialogue move candidates based on input context to rate one or more of the dialogue move candidates as the interpretation of the input utterance; and processing the input utterance in the appropriate device to produce an output most appropriate to the input utterance. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A system comprising:
-
a plurality of devices in a multi-device environment, each device of the plurality of devices having a respective activity model encapsulating device specific information for the device; an input circuit of each device of the plurality of devices for receiving and syntactically labeling an input pattern generated from an input utterance by a user, the input component including one or more subunits configured to generate a confidence score representing a probability or proper recognition of the input utterance, the user utterance directed generally to the plurality of devices in the multi-device environment; an input processor circuit of each device performing shallow processing of the input utterance through an activity tree functionally coupled to the activity model for each device to produce a plurality of candidate moves, wherein the activity tree manages activities of the devices relevant to the input utterance, the input processor further receiving a structured description of a dialogue contribution from the user within the input utterance, the structured description including one of syntactic, semantic and phonological information; a dialogue manager coupled to the input component that includes a plurality of dialogue moves including the candidate moves, and a dialogue move script (DMS) that is used in identifying the description and at least one parameter of the description and corresponds to at least one device of the plurality of devices, the DMS mapping the description to a dialogue move of the candidate dialogue moves, wherein the dialogue move is independent of the device; a speech recognizer component coupled to the input component and generating a confidence score quantifying the probability that the speech recognizer component can recognize the input utterance, and producing an n-best list of dialogue moves; a confidence mapping circuit coupled to the dialogue manager and configured to receive a confidence score from the speech recognizer component, map the confidence score to a qualitative confidence measure, and incorporate into the dialogue move script a keyword allowing the formulation by a dialog manager component of a confirmation question in response to the input utterance and formulate one of a confirmation question if the confidence score is above a defined threshold value or a help message if the confidence score is equal to or below the defined threshold value, the confidence mapping circuit further combining weighted confidence scores for the plurality of devices into a combined confidence score to select an appropriate device of the plurality of devices and re-ordering the n-best list based on the combined confidence score to rate one or more of the dialogue move candidates as the interpretation of the utterance; and a selection circuit causing the input utterance to be processed in the device of the plurality of devices with the highest confidence score to produce an output most appropriate to the input utterance. - View Dependent Claims (15, 16, 17, 18)
-
Specification