Dialogue management using scripts and combined confidence scores

US 7,904,297 B2
Filed: 12/08/2005
Issued: 03/08/2011
Est. Priority Date: 05/31/2005
Status: Active Grant

First Claim

Patent Images

1. A method of determining a dialogue move in a multi-device environment, comprising:

receiving an input utterance from a speaker through an input component, the input utterance directed to a plurality of devices in the multi-device environment, wherein each device of the plurality of devices is associated with a respective activity model encapsulating device-specific information for the respective device;

generating an input pattern from the input utterance that includes a structured description of a dialogue contribution by the speaker, the structured description including one of syntactic, semantic and phonological information;

performing shallow processing of the input utterance through an activity tree functionally coupled to the activity model for each device of the plurality of devices to produce a plurality of candidate dialogue moves, wherein the activity tree manages activities of the devices relevant to the input utterance;

identifying the description and at least one parameter of the description using a dialogue move script (DMS), wherein the DMS is used in identifying the description and at least one parameter of the description and corresponds to at least one device of the plurality of devices, and wherein the dialogue move is independent of the device;

mapping the description to a dialogue move of the candidate dialogue moves using the DMS, the dialogue move corresponding to the identified parameter;

receiving a confidence score from a speech recognizer component coupled to the input component, the confidence score quantifying the probability that the speech recognizer component can recognize the input utterance, and to produce an n-best list of dialogue moves;

translating the confidence score into a qualitative description of the likelihood of proper recognition;

incorporating into the dialogue move script a keyword allowing the formulation by a dialog manager component of a confirmation question in response to the input utterance; and

formulating one of a confirmation question if the confidence score is above a defined threshold value or a help message if the confidence score is equal to or below the defined threshold value;

combining weighted confidence scores for the plurality of devices into a combined confidence score to select an appropriate device of the plurality of devices and re-ordering the n-best list based on the combined confidence score to rate one or more of the dialogue move candidates as the interpretation of the utterance.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Representation-neutral dialogue systems and methods (“RNDS”) are described that include multi-application, multi-device spoken-language dialogue systems based on the information-state update approach. The RNDS includes representation-neutral core components of a dialogue system that provide scripted domain-specific extensions to routines such as dialogue move modeling and reference resolution, easy substitution of specific semantic representations and associated routines, and clean interfaces to external components for language-understanding (i.e., speech-recognition and parsing) and language-generation, and to domain-specific knowledge sources. The RNDS also resolves multi-device dialogue by evaluating and selecting among candidate dialogue moves based on features at multiple levels. Multiple sources of information are combined, multiple speech recognition and parsing hypotheses tested, and multiple device and moves considered to choose the highest scoring hypothesis overall. Confirmation and clarification behavior can be governed by the overall score.

Citations

18 Claims

1. A method of determining a dialogue move in a multi-device environment, comprising:
- receiving an input utterance from a speaker through an input component, the input utterance directed to a plurality of devices in the multi-device environment, wherein each device of the plurality of devices is associated with a respective activity model encapsulating device-specific information for the respective device;
  
  generating an input pattern from the input utterance that includes a structured description of a dialogue contribution by the speaker, the structured description including one of syntactic, semantic and phonological information;
  
  performing shallow processing of the input utterance through an activity tree functionally coupled to the activity model for each device of the plurality of devices to produce a plurality of candidate dialogue moves, wherein the activity tree manages activities of the devices relevant to the input utterance;
  
  identifying the description and at least one parameter of the description using a dialogue move script (DMS), wherein the DMS is used in identifying the description and at least one parameter of the description and corresponds to at least one device of the plurality of devices, and wherein the dialogue move is independent of the device;
  
  mapping the description to a dialogue move of the candidate dialogue moves using the DMS, the dialogue move corresponding to the identified parameter;
  
  receiving a confidence score from a speech recognizer component coupled to the input component, the confidence score quantifying the probability that the speech recognizer component can recognize the input utterance, and to produce an n-best list of dialogue moves;
  
  translating the confidence score into a qualitative description of the likelihood of proper recognition;
  
  incorporating into the dialogue move script a keyword allowing the formulation by a dialog manager component of a confirmation question in response to the input utterance; and
  
  formulating one of a confirmation question if the confidence score is above a defined threshold value or a help message if the confidence score is equal to or below the defined threshold value;
  
  combining weighted confidence scores for the plurality of devices into a combined confidence score to select an appropriate device of the plurality of devices and re-ordering the n-best list based on the combined confidence score to rate one or more of the dialogue move candidates as the interpretation of the utterance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the input component comprises one of a speech recognizer process, a parser, and a dialogue move classifier process.
  - 3. The method of claim 2 further comprising:
    - identifying proper names within the input pattern;
      
      replacing identified proper names with associated type classifications;
      
      labeling the input pattern with a classification denoting a class of a plurality of classes;
      
      modeling the input pattern using a conditional maximum entropy method to characterize the input pattern as a class and a feature; and
      
      determining a topic defined within a dialogue move script to which the input pattern corresponds.
  - 4. The method of claim 1 further comprising registering each device of the plurality of devices with a device manager associated with the input component to become associated with respective nodes to which new conversation threads can attach.
  - 5. The method of claim 1 wherein the activity model for each device comprises a declarative specification of the device and includes mappings from a predicate/argument structure to device actions.
  - 6. The method of claim 1 wherein the keyword comprises a label and the qualitative description of the likelihood of proper recognition, the method further comprising:
    - determining whether a pattern of the input utterance matches the keyword; and
      
      formulating the confirmation question if the input utterance matches the keyword prior to processing in a dialog move tree that is functionally coupled to the activity tree and utilizes the activity model.
  - 7. The method of claim 6 further comprising marking a flag in the DMS indicating that the dialog move is to be confirmed or that an action is to be taken in a device relevant to the input utterance without confirmation.
  - 8. The method of claim 1 wherein the help message comprises a one of:
    - a hint suggesting a possible recognizable input utterance, and a request for the user to rephrase the input utterance.

9. A method of determining a dialogue move in a multi-device environment, comprising:
- receiving an input utterance from a speaker through an input component, the input utterance directed to a plurality of devices in the multi-device environment, wherein each device of the plurality of devices is associated with a respective activity model encapsulating device-specific information for the respective device;
  
  generating an input pattern from the input utterance that includes a structured description of a dialogue contribution by the speaker, the structured description including one of syntactic, semantic and phonological information;
  
  performing shallow processing of the input utterance through an activity tree functionally coupled to the activity model for each device of the plurality of devices to produce a plurality of candidate dialogue moves, wherein the activity tree manages activities of the devices relevant to the input utterance;
  
  identifying the description and at least one parameter of the description using a dialogue move script (DMS), wherein the DMS is used in identifying the description and at least one parameter of the description and corresponds to at least one device of the plurality of devices, and wherein the dialogue move is independent of the device and application;
  
  mapping the description to a dialogue move of the candidate dialogue moves using the DMS, the dialogue move corresponding to the identified parameter;
  
  receiving a confidence score from a speech recognizer component coupled to the input component, the confidence score quantifying the probability that the speech recognizer component can recognize the input utterance, and to produce an n-best list of dialogue moves;
  
  translating the confidence score into a qualitative description of the likelihood of proper recognition;
  
  incorporating into the dialogue move script a keyword allowing the formulation by a dialog manager component of a confirmation question in response to the input utterance; and
  
  formulating one of a confirmation question if the confidence score is above a defined threshold value or a help message if the confidence score is equal to or below the defined threshold value;
  
  receiving a confidence score for each of a plurality of features of the input utterance for each device of the plurality of devices, wherein the confidence score comprises a numerical value representing a probability of proper recognition of the input utterance, and wherein the features are selected from the group consisting of;
  
  confidence scores from a speech recognizer component, confidence scores from a parser coupled to the speech recognizer, semantic criteria, pragmatic criteria, and dialogue context;
  
  assigning a weight to each feature to generate a weighted confidence score for each device based on the shallow processing by each device;
  
  combining the weighted confidence scores for the plurality of devices into a combined confidence score to select an appropriate device of the plurality of devices and re-ordering the n-best list of dialogue move candidates based on input context to rate one or more of the dialogue move candidates as the interpretation of the input utterance; and
  
  processing the input utterance in the appropriate device to produce an output most appropriate to the input utterance.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The method of claim 9 further comprising:
    - mapping the combined confidence score to a qualitative confidence measure;
      
      formulating a confirmation question to be sent to the speaker if no device of the plurality of devices has the highest weighted confidence score; and
      
      transmitting the confirmation question to the speaker, wherein the dialogue move corresponds to the identified parameter and an answer to the confirmation question.
  - 11. The method of claim 10, wherein the qualitative confidence measure comprises one of a low, medium and high syntactic value.
  - 12. The method of claim 11, further comprisingdefining a first confidence threshold to specify a level at which a highest scoring dialogue move candidate is accepted;
    - defining a second confidence threshold to specify a level at which the highest scoring dialogue move candidate is rejected; and
      
      rejecting the candidate move if the combined confidence score of the candidate dialogue move is below the second confidence threshold value.
  - 13. The method of claim 12, further comprising accepting the candidate move if the combined confidence score of the candidate dialogue move is above the first confidence threshold value.

14. A system comprising:
- a plurality of devices in a multi-device environment, each device of the plurality of devices having a respective activity model encapsulating device specific information for the device;
  
  an input circuit of each device of the plurality of devices for receiving and syntactically labeling an input pattern generated from an input utterance by a user, the input component including one or more subunits configured to generate a confidence score representing a probability or proper recognition of the input utterance, the user utterance directed generally to the plurality of devices in the multi-device environment;
  
  an input processor circuit of each device performing shallow processing of the input utterance through an activity tree functionally coupled to the activity model for each device to produce a plurality of candidate moves, wherein the activity tree manages activities of the devices relevant to the input utterance, the input processor further receiving a structured description of a dialogue contribution from the user within the input utterance, the structured description including one of syntactic, semantic and phonological information;
  
  a dialogue manager coupled to the input component that includes a plurality of dialogue moves including the candidate moves, and a dialogue move script (DMS) that is used in identifying the description and at least one parameter of the description and corresponds to at least one device of the plurality of devices, the DMS mapping the description to a dialogue move of the candidate dialogue moves, wherein the dialogue move is independent of the device;
  
  a speech recognizer component coupled to the input component and generating a confidence score quantifying the probability that the speech recognizer component can recognize the input utterance, and producing an n-best list of dialogue moves;
  
  a confidence mapping circuit coupled to the dialogue manager and configured to receive a confidence score from the speech recognizer component, map the confidence score to a qualitative confidence measure, and incorporate into the dialogue move script a keyword allowing the formulation by a dialog manager component of a confirmation question in response to the input utterance and formulate one of a confirmation question if the confidence score is above a defined threshold value or a help message if the confidence score is equal to or below the defined threshold value, the confidence mapping circuit further combining weighted confidence scores for the plurality of devices into a combined confidence score to select an appropriate device of the plurality of devices and re-ordering the n-best list based on the combined confidence score to rate one or more of the dialogue move candidates as the interpretation of the utterance; and
  
  a selection circuit causing the input utterance to be processed in the device of the plurality of devices with the highest confidence score to produce an output most appropriate to the input utterance.
- View Dependent Claims (15, 16, 17, 18)
- - 15. The system of claim 14 wherein the input processor circuit comprises at least one of an automatic speech recognizer and a parser.
  - 16. The system of claim 15 each device of the of the plurality of devices is configured to generate a confidence score for each feature of the input utterance, the dialogue manager configured to assign a weight to each feature to generate a weighted confidence score for each device of the one or more devices, and combine the weighted confidence scores for the one or more devices into a combined confidence score.
  - 17. The system of claim 16 wherein the features are selected from the group consisting of confidence scores from a speech recognizer component, confidence score from a parser, the combined confidence score, semantic criteria, pragmatic criteria, and dialogue context.
  - 18. The system of claim 17, wherein the qualitative confidence measure comprises one of a low, medium and high syntactic value, the dialogue manager further configured to:
    - define a first confidence threshold corresponding to the high syntactic value to specify a level at which a highest scoring dialogue move candidate is accepted;
      
      define a second confidence threshold corresponding to the low syntactic value to specify a level at which the highest scoring dialogue move candidate is rejected;
      
      compare the input pattern to patterns in a dialogue move script to determine a possible dialogue move.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Robert Bosch Corporation (Robert Bosch GmbH)
Original Assignee
Robert Bosch GmbH
Inventors
Xu, Kui, Zhang, Qi, Purver, Matthew, Ratiu, Florin, Scheideck, Tobias, Weng, Fuliang, Cavedon, Lawrence, Mirkovic, Danilo
Primary Examiner(s)
Dorvil; Richemond
Assistant Examiner(s)
BORSETTI, GREG

Application Number

US11/298,765
Publication Number

US 20060271364A1
Time in Patent Office

1,916 Days
Field of Search

704/239, 704/275, 704/9, 704/257
US Class Current

704/257
CPC Class Codes

G06F 40/40   Processing or translation o...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/228   of application context

Dialogue management using scripts and combined confidence scores

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Dialogue management using scripts and combined confidence scores

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links