Online asynchronous reinforcement learning from concurrent customer histories

US 8,924,318 B2
Filed: 09/28/2012
Issued: 12/30/2014
Est. Priority Date: 09/28/2011
Status: Active Grant

First Claim

Patent Images

1. An apparatus, comprising;

one or more computing devices, each of the computing devices having one or more processors and memories configured to perform a method of asynchronous reinforcement learning (RL), including;

obtaining an indication of a Decision Request;

receiving, obtaining, accessing or constructing a user state pertaining to at least one user; and

in response to the Decision Request;

scoring a plurality of actions according to one or more value functions based, at least in part, upon the user state;

applying a policy to identify one of the scored actions as a decision; and

providing an indication of the decision or applying the decision to the at least one user;

obtaining an indication of an Update Request, the Update Request being activated independent of user activity;

receiving, obtaining, accessing or constructing a further user state pertaining to the at least one user; and

in response to the Update Request;

updating at least one of;

the one or functions and the policy based, at least in part, upon the further user state,wherein the Decision Request is activated in response to an event timer and the event timer operates to periodically generate Decision Requests, wherein a frequency with which the event timer generates the Decision Requests is based at least in part, upon a period of time from a last user event pertaining to the at least one user or from a last user action, the last user action including the providing of the indication of the decision or the applying of the decision to the at least one user.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one embodiment, an indication of a Decision Request or an Update Request may be received, where the Update Request is activated independent of user activity. A user state pertaining to at least one user may be received, obtained, accessed or constructed. For the Decision Request, one or more actions may be scored according to one or more value functions associated with a computing device, a policy associated with the computing device may be applied to identify one of the scored actions as a decision, and an indication of the decision may be provided or applied. For the Update Request, the one or more value functions and/or the policy may be updated. An indication of updates to the one or more value functions and/or an indication of updates to the policy may be provided.

41 Citations

View as Search Results

20 Claims

1. An apparatus, comprising;
- one or more computing devices, each of the computing devices having one or more processors and memories configured to perform a method of asynchronous reinforcement learning (RL), including;
  
  obtaining an indication of a Decision Request;
  
  receiving, obtaining, accessing or constructing a user state pertaining to at least one user; and
  
  in response to the Decision Request;
  
  scoring a plurality of actions according to one or more value functions based, at least in part, upon the user state;
  
  applying a policy to identify one of the scored actions as a decision; and
  
  providing an indication of the decision or applying the decision to the at least one user;
  
  obtaining an indication of an Update Request, the Update Request being activated independent of user activity;
  
  receiving, obtaining, accessing or constructing a further user state pertaining to the at least one user; and
  
  in response to the Update Request;
  
  updating at least one of;
  
  the one or functions and the policy based, at least in part, upon the further user state,wherein the Decision Request is activated in response to an event timer and the event timer operates to periodically generate Decision Requests, wherein a frequency with which the event timer generates the Decision Requests is based at least in part, upon a period of time from a last user event pertaining to the at least one user or from a last user action, the last user action including the providing of the indication of the decision or the applying of the decision to the at least one user.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The apparatus of claim 1, wherein for the Update Request, each of the computing devices being configured for performing additional steps, comprising:
    - providing an indication of updates to the one or more value functions or an indication of updates to the policy.
  - 3. The apparatus of claim 1, wherein the further user state indicates the decision.
  - 4. The apparatus of claim 1, wherein the Decision Request is activated independent of activity of the at least one user.
  - 5. The apparatus of claim 1, wherein the last user event is an interaction of the user with a website.
  - 6. The apparatus of claim 1, wherein the frequency with which the event timer generates the Decision Requests decreases as the period of time increases.

7. A computer-implemented method of performing asynchronous reinforcement learning (RL), comprising:
- obtaining an indication of a Decision Request pertaining to at least one user;
  
  obtaining an indication of an Update Request pertaining to the at least one user;
  
  receiving, obtaining, accessing or constructing a user state pertaining to at least one user, the Update Request being activated independent of activity of the at least one user;
  
  in response to the indication of the Update Request, updating at least one of;
  
  one or more value functions and a policy based, at least in part, upon the user state; and
  
  performing an action with respect to the at least one user in response to the Decision Request,wherein updating the one or more value functions includes incorporating non-response data into the one or more value functions, wherein a response to the action taken with respect to the at least one user has not been received or detected.
- View Dependent Claims (8, 9, 10, 11, 14, 15, 19, 20)
- - 8. The computer-implemented method of claim 7, further comprising:
    - providing an indication of updates to at east one of the one or more value functions and the policy.
  - 9. The computer-implemented method of claim 7, wherein the method is performed by a Decisioning Component, wherein the one or more value functions are associated with the Decisioning Component.
  - 10. The computer-implemented of claim 9, wherein the policy is associated with the Decisioning Component.
  - 11. The computer-implemented method of claim 7, wherein updating is performed based, at least in part, upon a period of time since a last user event or a last action taken with respect to the at least one user.
  - 14. The computer-implemented method of claim 7, wherein the one or more value functions approximate an expected reward over time pertaining to the at least one user.
  - 15. The computer-implemented method of claim 7, wherein the Update Request is activated in response to an event timer.
  - 19. The computer-implemented method of claim 7, wherein the Decision Request is activated independent of activity of the at least one user.
  - 20. The computer-implemented method of claim 7, further comprising:
    - receiving a request;
      
      determining whether the request is a Decision request or an Update request; and
      
      processing the request according to a result of the determining step.

12. A computer-implemented method of performing asynchronous reinforcement learning (RL) comprising:
- obtaining an indication of a Decision Request pertaining to at least one user;
  
  obtaining an indication of an Update Request pertaining to the at least one user;
  
  receiving, obtaining, accessing or constructing a user state pertaining to at least one user, the Update Request being activated independent of activity of the at least one user;
  
  in response to the indication of the Update Request, updating at least one of;
  
  one or more value functions and a policy based, at least in part, upon the user state; and
  
  recording or determining a tune since an action was taken with respect to the at least one user or a time since a last user event pertaining to the at least one user;
  
  wherein time(s) at which the updating is performed is determined based, at least in part, upon the time since the action was taken with respect to the at least one user and/or the time since the last user event pertaining to the at least one user.

13. A computer-implemented method of performing asynchronous reinforcement learning (RL) comprising:
- obtaining an indication of a Decision Request pertaining to at least one user;
  
  obtaining an indication of an Update Request pertaining to the at least one user;
  
  receiving, obtaining, accessing or constructing a user state pertaining to at least one user, the Update Request being activated independent of activity of the at least one user;
  
  in response to the indication of the Update Request, updating at least one of;
  
  one or more value functions and a policy based, at least in part, upon the user state; and
  
  performing an action with respect to the at least one user in response to the Decision Request; and
  
  determining a time since the action was performed with respect to the at least one user;
  
  wherein updating includes updating the one or more value functions based, at least in part, upon the time since the action was taken with respect to the at least one user.

16. A computer implemented method of performing asynchronous reinforcement learning (RL) comprising;
- obtaining an indication of a Decision Request pertaining to at least one user;
  
  obtaining an indication of an Update Request pertaining to the at least one user;
  
  receiving, obtaining, accessing or constructing a user state pertaining to at least one user, the Update Request being activated independent of activity of the at least one user; and
  
  in response to the indication of the Update Request, updating at least one of;
  
  one or more value functions and a policy based, at least in part, upon the user state,wherein an event timer operates to periodically generate an Update Request, wherein a frequency with which the event timer generates an Update Request is based, at least in part, upon a period of time from a last user event pertaining to the at least one user or a last action performed with respect to the at least one user.
- View Dependent Claims (17, 18)
- - 17. The computer-implemented method of claim 16, wherein the last user event is an interaction of the at least one user with a website.
  - 18. The computer-implemented method of claim 16, wherein the frequency with which the event timer generates an Update Request decreases as the period of time from the last user event or the last action increases.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NICE Systems Technologies UK Limited (Nice Ltd)
Original Assignee
NICE Systems Technologies UK Limited (Nice Ltd)
Inventors
Newnham, Leonard Michael, McFall, Jason Derek, Barker, David J, Silver, David
Primary Examiner(s)
CHANG, LI WU

Application Number

US13/631,053
Publication Number

US 20130080358A1
Time in Patent Office

823 Days
Field of Search

None
US Class Current

706/12
CPC Class Codes

G06N 20/00 Machine learning

G06N 5/02 Knowledge representation; S...

Online asynchronous reinforcement learning from concurrent customer histories

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

41 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Online asynchronous reinforcement learning from concurrent customer histories

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

41 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links