Online learning for dialog systems
First Claim
Patent Images
1. An online learning dialog system comprising:
- one or more processing units;
memory communicatively coupled to the one or more processing units, the memory having stored instructions that, when executed by the one or more processing units, configure the online learning dialog system to implement;
a speech model that receives a speech input and provides speech events;
a decision engine model that receives the speech events from the speech model and selects an action based, at least in part, upon a probability distribution, the probability distribution being associated with uncertainty regarding a plurality of parameters of the decision engine model applied to the speech input, wherein the probability distribution is;
defined by an influence diagram that is configured to maximize long term expected utility and apply the Thompson strategy; and
expressed as;
where U denotes chance variables, D denotes decision variables, and V denotes value variables;
where Pa(X) denotes a set of parents for node X; and
where Θ
X denotes a subset of parameters related to the applied speech input in Θ
that define local distribution of X; and
, a learning component that in an online manner modifies at least one of the parameters of the decision engine model based upon feedback associated with the selected action, wherein the feedback comprises a lack of verbal input from a user of the system or an environment within a predefined period of time.
2 Assignments
0 Petitions
Accused Products
Abstract
An online dialog system and method are provided. The dialog system receives speech input and outputs an action according to its models. After executing the action, the system receives feedback from the environment or user. The system immediately utilizes the feedback to update its models in an online fashion.
131 Citations
18 Claims
-
1. An online learning dialog system comprising:
-
one or more processing units; memory communicatively coupled to the one or more processing units, the memory having stored instructions that, when executed by the one or more processing units, configure the online learning dialog system to implement; a speech model that receives a speech input and provides speech events; a decision engine model that receives the speech events from the speech model and selects an action based, at least in part, upon a probability distribution, the probability distribution being associated with uncertainty regarding a plurality of parameters of the decision engine model applied to the speech input, wherein the probability distribution is; defined by an influence diagram that is configured to maximize long term expected utility and apply the Thompson strategy; and expressed as; where U denotes chance variables, D denotes decision variables, and V denotes value variables; where Pa(X) denotes a set of parents for node X; and where Θ
X denotes a subset of parameters related to the applied speech input in Θ
that define local distribution of X; and
,a learning component that in an online manner modifies at least one of the parameters of the decision engine model based upon feedback associated with the selected action, wherein the feedback comprises a lack of verbal input from a user of the system or an environment within a predefined period of time. - View Dependent Claims (3, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
2. An online learning dialog method implemented at a computing device, the method comprising:
-
receiving, at the computing device, voice input from a user; determining, at the computing device, whether the voice input from the user is accepted as understood and initiate corresponding actions or the voice input is ambiguous and is in need of exploration based at least on a probability distribution associated with uncertainty regarding parameters of a decision engine model applied to the voice input, wherein the probability distribution is defined by an influence diagram that is configured to apply the Thompson strategy; selecting an action based, at least in part, upon the probability distribution; receiving, at the computing device, feedback associated with the selected action; and updating at least one of the parameters of the decision engine model based, at least in part, upon the feedback associated with the selected action such that the decision engine model of the computing device is configured to maximize long term expected utility via the updating at least the one of the parameters of the decision engine model, wherein the feedback comprises a lack of verbal response to the selected action in a threshold period of time. - View Dependent Claims (4, 16, 17, 18)
-
-
14. An online learning dialog system comprising:
-
means for receiving voice input; means for modeling the voice input based on a probability distribution associated with uncertainty regarding a plurality of parameters of the means for modeling the voice input, wherein the probability distribution is defined by an influence diagram that is configured to apply the Thompson strategy; means for selecting an action based, at upon in part, upon the probability distribution received from the means for modeling the voice input; and means for modifying the plurality of parameters of the means for modeling the voice input based upon feedback associated with the selected action, wherein the feedback comprises a lack of verbal response from a user in a threshold period of time. - View Dependent Claims (15)
-
Specification