Online learning for dialog systems

US 7,734,471 B2
Filed: 06/29/2005
Issued: 06/08/2010
Est. Priority Date: 03/08/2005
Status: Active Grant

First Claim

Patent Images

1. An online learning dialog system comprising:

one or more processing units;

memory communicatively coupled to the one or more processing units, the memory having stored instructions that, when executed by the one or more processing units, configure the online learning dialog system to implement;

a speech model that receives a speech input and provides speech events;

a decision engine model that receives the speech events from the speech model and selects an action based, at least in part, upon a probability distribution, the probability distribution being associated with uncertainty regarding a plurality of parameters of the decision engine model applied to the speech input, wherein the probability distribution is;

defined by an influence diagram that is configured to maximize long term expected utility and apply the Thompson strategy; and

expressed as;

$p (U, V | D, Θ) = \prod_{X \in U ⋃ V} p (X | Pa (X), Θ_{X})$ where U denotes chance variables, D denotes decision variables, and V denotes value variables;

where Pa(X) denotes a set of parents for node X; and

where Θ

_Xdenotes a subset of parameters related to the applied speech input in Θ

that define local distribution of X; and

, a learning component that in an online manner modifies at least one of the parameters of the decision engine model based upon feedback associated with the selected action, wherein the feedback comprises a lack of verbal input from a user of the system or an environment within a predefined period of time.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An online dialog system and method are provided. The dialog system receives speech input and outputs an action according to its models. After executing the action, the system receives feedback from the environment or user. The system immediately utilizes the feedback to update its models in an online fashion.

131 Citations

18 Claims

1. An online learning dialog system comprising:
- one or more processing units;
  
  memory communicatively coupled to the one or more processing units, the memory having stored instructions that, when executed by the one or more processing units, configure the online learning dialog system to implement;
  
  a speech model that receives a speech input and provides speech events;
  
  a decision engine model that receives the speech events from the speech model and selects an action based, at least in part, upon a probability distribution, the probability distribution being associated with uncertainty regarding a plurality of parameters of the decision engine model applied to the speech input, wherein the probability distribution is;
  
  defined by an influence diagram that is configured to maximize long term expected utility and apply the Thompson strategy; and
  
  expressed as;
  
  $p (U, V | D, Θ) = \prod_{X \in U ⋃ V} p (X | Pa (X), Θ_{X})$ where U denotes chance variables, D denotes decision variables, and V denotes value variables;
  
  where Pa(X) denotes a set of parents for node X; and
  
  where Θ
  
  _Xdenotes a subset of parameters related to the applied speech input in Θ
  
  that define local distribution of X; and
  
  , a learning component that in an online manner modifies at least one of the parameters of the decision engine model based upon feedback associated with the selected action, wherein the feedback comprises a lack of verbal input from a user of the system or an environment within a predefined period of time.
- View Dependent Claims (3, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 3. A voice-controlled mobile device that comprises the system of claim 1.
  - 5. The system of claim 1, wherein the instructions that, when executed by the one or more processing units, configure the online learning dialog system to further implement a repair dialog on a display of the system.
  - 6. The system of claim 5, wherein the repair dialog includes a request to repeat and/or a request for confirmation.
  - 7. The system of claim 1, wherein the speech model is configured to:
    - ignore the speech input,execute corresponding to a most likely command associated with the speech input,request to repeat the speech input, andprovide information associated with a plurality of likely commands along with a request to confirm the speech input.
  - 8. The system of claim 1, wherein the feedback further comprises a negative input or a positive input utterance from the user of the system or the environment.
  - 9. The system of claim 1, wherein the plurality of parameters of the decision engine model are updated based on the feedback associated with the selected action.
  - 10. The system of claim 1, wherein the learning component employs retrospective analysis to modify at least one of the plurality of parameters of the decision engine model.
  - 11. The system of claim 1, wherein the feedback comprises a lack of an input from a user of the system within a threshold period of time.
  - 12. The system of claim 1, wherein the decision engine model comprises a Markov decision process.
  - 13. The system of claim 1, wherein:
    - Dirichlet priors are used in the plurality of parameters for conditional distributions of discrete variables of the decision engine model, andNormal-Wishart priors are used in the plurality of parameters for distributions of continuous variables of the decision engine model.

2. An online learning dialog method implemented at a computing device, the method comprising:
- receiving, at the computing device, voice input from a user;
  
  determining, at the computing device, whether the voice input from the user is accepted as understood and initiate corresponding actions or the voice input is ambiguous and is in need of exploration based at least on a probability distribution associated with uncertainty regarding parameters of a decision engine model applied to the voice input, wherein the probability distribution is defined by an influence diagram that is configured to apply the Thompson strategy;
  
  selecting an action based, at least in part, upon the probability distribution;
  
  receiving, at the computing device, feedback associated with the selected action; and
  
  updating at least one of the parameters of the decision engine model based, at least in part, upon the feedback associated with the selected action such that the decision engine model of the computing device is configured to maximize long term expected utility via the updating at least the one of the parameters of the decision engine model, wherein the feedback comprises a lack of verbal response to the selected action in a threshold period of time.
- View Dependent Claims (4, 16, 17, 18)
- - 4. A speech application embedded on a non-transitory computer storage medium to implement the method as recited in claim 2.
  - 16. A voice-controlled web browser embedded on a non-transitory computer storage medium to implement the method as recited in claim 2.
  - 17. The method of claim 2, wherein the feedback further comprises a verbal response to the selected action in a threshold period of time.
  - 18. A computer readable medium having stored thereon computer executable instructions for carrying out the method of claim 2.

14. An online learning dialog system comprising:
- means for receiving voice input;
  
  means for modeling the voice input based on a probability distribution associated with uncertainty regarding a plurality of parameters of the means for modeling the voice input, wherein the probability distribution is defined by an influence diagram that is configured to apply the Thompson strategy;
  
  means for selecting an action based, at upon in part, upon the probability distribution received from the means for modeling the voice input; and
  
  means for modifying the plurality of parameters of the means for modeling the voice input based upon feedback associated with the selected action, wherein the feedback comprises a lack of verbal response from a user in a threshold period of time.
- View Dependent Claims (15)
- - 15. The system of claim 14, wherein the means for selecting an action employs a heuristic technique to maximize long term expected utility.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Paek, Timothy S., Chickering, David M., Horvitz, Eric J.
Primary Examiner(s)
Han; Qi

Application Number

US11/170,999
Publication Number

US 20060206337A1
Time in Patent Office

1,805 Days
Field of Search

704/270, 704/270.1, 704/275, 704/255, 704/256, 704/256.2, 704/256.3, 704/231, 704/235
US Class Current

704/270.1
CPC Class Codes

G10L 15/065 Adaptation

Online learning for dialog systems

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

131 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Online learning for dialog systems

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

131 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links