Online temporal difference learning from incomplete customer interaction histories
First Claim
1. A computer implemented method, comprising:
- obtaining an indication that a decision has been requested, selected, or applied with respect to one or more users; and
after obtaining the indication that a decision that has been requested, selected, or applied, updating a time dependent value function, including performing or providing one or more updates to the time dependent value function, wherein a time at which each of the one or more updates is performed or provided is independent of activity of the one or more users;
wherein the time dependent value function approximates an expected reward as a function of one or more time based variables corresponding to one or more time based values, the expected reward being associated with the one or more users, at least one of the time based variables indicates an elapsed time since a prior or last user event pertaining to at least one of the one or more users, the one or more updates to the value function indicate update(s) to one or more weights associated with one or more parameters of the time dependent value function, and the update(s) to the one or more weights include a modification or replacement value for each of the one or more weights.
3 Assignments
0 Petitions
Accused Products
Abstract
In one embodiment, an indication that a decision has been requested, selected, or applied with respect to one or more users may be obtained. After the indication that a decision that has been requested, selected, or applied is obtained, a value function may be updated, where the value function approximates an expected reward associated with the one or more users over time since the decision has been requested, selected, or applied with respect to the one or more users. The value function may be updated by performing or providing one or more updates to the value function, where a time at which each of the one or more updates is performed or provided is independent of activity of the one or more users.
42 Citations
37 Claims
-
1. A computer implemented method, comprising:
-
obtaining an indication that a decision has been requested, selected, or applied with respect to one or more users; and after obtaining the indication that a decision that has been requested, selected, or applied, updating a time dependent value function, including performing or providing one or more updates to the time dependent value function, wherein a time at which each of the one or more updates is performed or provided is independent of activity of the one or more users; wherein the time dependent value function approximates an expected reward as a function of one or more time based variables corresponding to one or more time based values, the expected reward being associated with the one or more users, at least one of the time based variables indicates an elapsed time since a prior or last user event pertaining to at least one of the one or more users, the one or more updates to the value function indicate update(s) to one or more weights associated with one or more parameters of the time dependent value function, and the update(s) to the one or more weights include a modification or replacement value for each of the one or more weights. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A computer-implemented method comprising:
-
obtaining an indication that a decision has been requested, selected, or applied with respect to one or more users; and after obtaining the indication that a decision that has been requested, selected, or applied, updating a time dependent value function, including performing or providing one or more updates to the time dependent value function, wherein a time at which each of the one or more updates is performed or provided is independent of activity of the one or more users, wherein the time dependent value function is defined as a product of two terms; wherein the one or more updates to the value function indicate update(s) to one or more weights associated with one or more parameters of the time dependent value function, and the update(s) to the one or more weights include a modification or replacement value for each of the one or more weights, wherein each of the two terms operates on at least a subset of a set of weights, the set of weights corresponding to parameters of the time dependent value function.
-
-
27. A computer-implemented method comprising:
-
obtaining an indication that a decision has been requested, selected, or applied with respect to one or more users; and after obtaining the indication that a decision that has been requested, selected, or applied, updating a time dependent value function, including performing or providing one or more updates to the time dependent value function, wherein a time at which each of the one or more updates is performed or provided is independent of activity of the one or more users, wherein the time dependent value function is defined as a product of two terms, wherein one of the two terms is a time-varying function, wherein the one or more updates to the value function indicate update(s) to one or more weights associated with one or more parameters of the time dependent value function, and the update(s) to the one or more weights include a modification or replacement value for each of the one or more weights. - View Dependent Claims (28)
-
-
29. A computer-implemented method comprising:
-
obtaining an indication that a decision has been requested, selected, or applied with respect to one or more users; and after obtaining the indication that a decision that has been requested, selected, or applied, updating a time dependent value function, including performing or providing one or more updates to the time dependent value function, wherein a time, at which each of the one or more updates is performed or provided is independent of activity of the one or more users; wherein the one or more updates to the value function indicate update(s) to one or more weights associated with one or more parameters of the time dependent value function, and the update(s) to the one or more weights include a modification or replacement value for each of the one or more weights, wherein the time dependent value function is defined as a product of a first function and a second function, the first function operating on at least a first subset of a set of weights, the second function operating on at least a second subset of the set of weights, the set of weights corresponding to parameters of the time dependent value function. - View Dependent Claims (30, 31, 32, 33, 34, 35)
-
-
36. A computer-implemented method comprising:
-
obtaining an indication that a decision has been requested, selected, or applied with respect to one or more users; after obtaining the indication that a decision that has been requested, selected, or applied, updating a time dependent value function, including performing or providing one or more updates to the time dependent value function, wherein a time at which each of the one or more updates is performed or provided is independent of activity of the one or more users, wherein the one or more updates to the value function indicate update(s) to one or more weights associated with one or more parameters of the time dependent value function, and the update(s) to the one or more weights include a modification or replacement value for each of the one or more weights, scheduling one or more updates to the time dependent value function, obtaining an indication that a subsequent decision pertaining to the at least one user is requested, selected, or applied; and after obtaining the indication that the subsequent decision pertaining to the at least one user is requested, selected, or applied, cancelling scheduled updates to the time dependent value function that have not yet been performed.
-
-
37. A computer implemented method comprising:
-
obtaining an indication that a decision has been requested, selected, or applied with respect to one or more users; and after obtaining the indication that a decision that has been requested, selected, or applied, updating a time dependent value function, including performing or providing one or more updates to the time dependent value function, wherein a time at which each of the one or more updates is performed or provided is independent of activity of the one or more users; wherein the time dependent value function approximates an expected reward as a function of one or more time based variables corresponding to one or more time based values, the expected reward being associated with the one or more users, wherein at least one of the time based variables indicates an elapsed time since a prior or last user event pertaining to at least one of the one or more users, wherein the one or more updates to the value function indicate update(s) to one or more weights associated with one or more parameters of the time dependent value function, and the updates to the one or more weights include a modification or replacement value for each of the one or more weights.
-
Specification