Online asynchronous reinforcement learning from concurrent customer histories
First Claim
1. A computer implemented method, comprising:
- obtaining an indication that a decision has been requested or selected with respect to one or more users;
determining whether to schedule, request, or perform a set of one or more activities, the set of one or more activities including performing one or more updates and selecting one or more decisions, wherein the one or more updates are performed with respect to a value function approximating an expected reward over time for the one or more users and a policy for selecting additional decisions, and wherein the one or more decisions pertain to the one or more users; and
scheduling, requesting, or performing the set of one or more activities according to a result of the determining step, wherein scheduling, requesting, or performing the set of one or more activities comprises;
generating a sequence of requests, wherein the sequence of requests includes one or more Update Requests and one or more Decision Requests, wherein each request in the sequence of requests pertains to the one or more users; and
providing or transmitting each request in the sequence of requests or indication thereof according to a particular schedule, wherein each of the one or more Decision Requests indicates a request to select an additional decision with respect to the at least one user,wherein each of the Update Requests indicates at least one of;
a request to update a value function approximating an expected reward over time for the one or more users and a request to update a policy for selecting additional decisions.
3 Assignments
0 Petitions
Accused Products
Abstract
In one embodiment, an indication of a Decision Request or an Update Request may be received, where the Update Request is activated independent of user activity. A user state pertaining to at least one user may be received, obtained, accessed or constructed. For the Decision Request, one or more actions may be scored according to one or more value functions associated with a computing device, a policy associated with the computing device may be applied to identify one of the scored actions as a decision, and an indication of the decision may be provided or applied. For the Update Request, the one or more value functions and/or the policy may be updated. An indication of updates to the one or more value functions and/or an indication of updates to the policy may be provided.
46 Citations
29 Claims
-
1. A computer implemented method, comprising:
-
obtaining an indication that a decision has been requested or selected with respect to one or more users; determining whether to schedule, request, or perform a set of one or more activities, the set of one or more activities including performing one or more updates and selecting one or more decisions, wherein the one or more updates are performed with respect to a value function approximating an expected reward over time for the one or more users and a policy for selecting additional decisions, and wherein the one or more decisions pertain to the one or more users; and scheduling, requesting, or performing the set of one or more activities according to a result of the determining step, wherein scheduling, requesting, or performing the set of one or more activities comprises; generating a sequence of requests, wherein the sequence of requests includes one or more Update Requests and one or more Decision Requests, wherein each request in the sequence of requests pertains to the one or more users; and providing or transmitting each request in the sequence of requests or indication thereof according to a particular schedule, wherein each of the one or more Decision Requests indicates a request to select an additional decision with respect to the at least one user, wherein each of the Update Requests indicates at least one of;
a request to update a value function approximating an expected reward over time for the one or more users and a request to update a policy for selecting additional decisions. - View Dependent Claims (2, 3, 4)
-
-
5. A computer implemented method, comprising:
-
obtaining an indication that a decision has been requested or selected with respect to one or more users; after obtaining the indication of the decision that has been requested or selected, requesting or performing a sequence of one or more activities according to a schedule, the sequence of one or more activities including performing one or more updates and selecting one or more decisions, wherein the one or more updates are performed with respect to a value function approximating an expected reward over time for the one or more users and a policy for selecting additional decisions, and wherein the one or more decisions pertain to the one or more users; prior to requesting or performing the sequence of activities, generating the schedule according to which the sequence of one or more activities are to be requested or performed, wherein generating the schedule is performed in response to at least one of;
the indication that the decision was requested or selected, an indication that an update to the value function pertaining to the one or more users was requested, an indication that an update to the policy pertaining to the one or more users was requested, customer input and other input. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17)
-
-
14. A computer implemented method comprising:
-
obtaining an indication that a decision has been requested or selected with respect to one or more users; after obtaining the indication of the decision that has been requested or selected, requesting or performing a sequence of one or more activities according to a schedule, the sequence of one or more activities including performing one or more updates and selecting one or more decisions, wherein the one or more updates are performed with respect to a value function approximating an expected reward over time for the one or more users and a policy for selecting additional decisions, and wherein the one or more decisions pertain to the one or more users; prior to requesting or performing the sequence of activities, generating the schedule according to which the sequence of one or more activities are to be requested or performed, wherein generating the schedule is performed based, at least in part, upon a decision that has been selected with respect to the one or more users or an outcome of an update pertaining to the one or more users.
-
-
18. A computer implemented method comprising:
-
obtaining an indication that a decision has been requested or selected with respect to one or more users; after obtaining the indication of the decision that has been requested or selected, requesting or performing a sequence of one or more activities according to a schedule, the sequence of one or more activities including performing one or more updates and selecting one or more decisions, wherein the one or more updates are performed with respect to a value function approximating an expected reward over time for the one or more users and a policy for selecting additional decisions, and wherein the one or more decisions pertain to the one or more users, wherein the sequence of activities comprises a plurality of decisions to be selected, wherein each of the plurality of decisions pertains to a corresponding type of action, wherein the type of action for each of the plurality of decisions to be selected is identified in the schedule, each of the plurality of decisions including one or more actions.
-
-
19. An apparatus, comprising:
-
a processor; and a memory, at least one of the processor or the memory being adapted for; obtaining an indication that a decision has been requested or selected with respect to one or more users; after obtaining the indication of the decision that has been requested or selected, requesting or performing a sequence of one or more activities according to a schedule, the sequence of one or more activities including performing one or more updates and selecting one or more decisions, wherein the one or more updates are performed with respect to a value function approximating an expected reward over time for the one or more users and a policy for selecting additional decisions, and wherein the one or more decisions pertain to the one or more users, wherein requesting or performing the sequence of activities comprises providing or transmitting a sequence of one or more requests, wherein each of the requests in the sequence of requests is one of two types of requests, the two types of requests include a Decision Request and an Update Request and the schedule indicates a type of each of the requests in the sequence of requests and a number of each type of requests in the sequence of requests. - View Dependent Claims (20, 21, 26, 27, 28, 29)
-
-
22. An apparatus comprising:
-
a processor; and a memory, at least one of the processor or the memory being adapted for; obtaining an indication that a decision has been requested or selected with respect to one or more users; after obtaining the indication of the decision that has been requested or selected, requesting or performing a sequence of one or more activities according to a schedule, the sequence of one or more activities including performing one or more updates and selecting one or more decisions, wherein the one or more updates are performed with respect to a value function approximating an expected reward over time for the one or more users and a policy for selecting additional decisions, and wherein the one or more decisions pertain to the one or more users, wherein requesting or performing the sequence of activities comprises providing or transmitting a sequence of one or more requests; and at least one of the processor or the memory being further adapted for; receiving, obtaining, accessing or constructing a first user state pertaining to the one or more users; prior to generating a first portion of the sequence of requests, determining a first schedule based, at least in part, upon the first user state; receiving, obtaining, accessing or constructing a second user state pertaining to the one or more users; prior to generating a second portion of the sequence of requests, determining a second schedule based, at least in part, upon the second user state. - View Dependent Claims (23, 24, 25)
-
Specification