System and method for sequential decision making for customer relationship management

US 8,285,581 B2
Filed: 06/17/2008
Issued: 10/09/2012
Est. Priority Date: 07/19/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method for sequential decision making for customer relationship management, comprising:

providing customer data comprising stimulus-response history data for a plurality of customers, said stimulus response history data being derived from event data for said customers;

in a processor, automatically generating actionable rules for optimizing a sequence of decisions over a period of time based on said stimulus-response history data;

estimating a value function using batch reinforcement learning with function approximation, said function approximation representing the value function as a function of state features and actions; and

transforming an output of the value function estimation into said actionable rules,wherein the estimating of the value function comprises;

estimating a function approximation of the value function of a Markov Decision Process underlying said stimulus-response history data for the plurality of customers used as training data; and

iteratively applying a regression model to the training data which comprises sequences of states, actions and rewards resulting for said plurality of customers, and updating in each iteration a target reward value for each state-action pair.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for sequential decision-making for customer relationship management includes providing customer data including stimulus-response history data, and automatically generating actionable rules based on the customer data. Further, automatically generating actionable rules may include estimating a value function using reinforcement learning.

17 Citations

View as Search Results

17 Claims

1. A computer-implemented method for sequential decision making for customer relationship management, comprising:
- providing customer data comprising stimulus-response history data for a plurality of customers, said stimulus response history data being derived from event data for said customers;
  
  in a processor, automatically generating actionable rules for optimizing a sequence of decisions over a period of time based on said stimulus-response history data;
  
  estimating a value function using batch reinforcement learning with function approximation, said function approximation representing the value function as a function of state features and actions; and
  
  transforming an output of the value function estimation into said actionable rules,wherein the estimating of the value function comprises;
  
  estimating a function approximation of the value function of a Markov Decision Process underlying said stimulus-response history data for the plurality of customers used as training data; and
  
  iteratively applying a regression model to the training data which comprises sequences of states, actions and rewards resulting for said plurality of customers, and updating in each iteration a target reward value for each state-action pair.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method according to claim 1, wherein said actionable rules specify an action to take given a set of feature values corresponding to a customer, and the action taken corresponding to an action having an approximate maximum value according to said value function for the given set of feature values.
  - 3. The method according to claim 1, wherein said reinforcement learning comprises maximizing a net present value of profits with respect to a customer.
  - 4. The method according to claim 1, wherein said reinforcement learning maximizes a cumulative reward, R, given by
  - 5. The method according to claim 4, wherein γ
    - comprises a discount factor for calculating a net present value of future rewards based on a given interest rate.
  - 6. The method according to claim 1, wherein said batch reinforcement learning comprises batch reinforcement learning with function approximation based on one of Q-learning and sarsa-learning.
  - 7. The method according to claim 1, wherein said customer data comprises one of consumer data, client data and donor data.
  - 8. The method according to claim 1, wherein said transforming comprises generating training data using a model of said value function, said training data comprising a set of feature values and an action that corresponds to a maximum value for said set of feature values, and outputting a set of rules that are based on said training data, andwherein said set of rules are generated using one of a classifier and a supervised learning tool.
  - 9. The method according to claim 1, wherein said providing customer data comprises selecting customer data.
  - 10. The method according to claim 9, wherein a value estimation repeatedly calls a data selection module one or more times per each iteration of a value iteration.
  - 11. The method according to claim 10, wherein said value estimation repeatedly calls said data selection module by using a selection criterion to select a restricted subset for use in a value function estimation.
  - 12. The method according to claim 11, wherein said using a selection criterion comprises selecting only those events such that their “
    - next”
      
      states satisfy a condition that an action taken corresponds to an action with a maximum estimated cumulative value, with respect to an estimated value function from a previous value iteration.
  - 13. The method according to claim 12, wherein said using a selection criteria comprises selecting only those events such that their next n states satisfy a condition that an action taken in each state corresponds to an action with a maximum estimated cumulative value.
  - 14. The method according to claim 1, wherein said stimulus-response history data for each customer comprises a sequence of said customer'"'"'s demographic features and event features, at multiple time points, which collectively reflect the state of said customer at a point in time, an action taken at or around that time by an entity, a response taken by said customer at or around that time, and an amount of profit realized by said entity and associated with said action.
  - 15. The method according to claim 14, wherein said updating said target reward value for each state-action pair uses one of a look-ahead update formula of reinforcement learning, in terms of the reward value corresponding to said state-action pair in the training data, and an estimated value of the value function estimate from a previous iteration, for state-action pairs at a future time following said state-action pair.

16. A system for generating targeted marketing rules for a customer, said system comprising:
- a transforming unit for transforming customer transaction data to create derived features said transaction data comprising data for a plurality of customers;
  
  a data development unit for using said derived features to develop current customer profile data and combined historical customer profile and stimulus response data;
  
  a processor including a data mining unit for performing data mining on the combined data to develop a stimulus-response model;
  
  a stimulus optimization unit for performing stimulus optimization using said combined historical customer profile and stimulus-response data and said stimulus-response model with business rules; and
  
  a rule generator for generating customer relationship management (CRM) rules by performing data mining on said combined data and said stimulus optimization,wherein said stimulus optimization comprises estimating a value function using batch reinforcement learning with function approximation, the estimating of the value function comprising;
  
  estimating a function approximation of the value function of a Markov Decision Process underlying said stimulus-response data for the plurality of customers used as training data; and
  
  iteratively applying a regression model to the training data which comprises sequences of states, actions and rewards resulting for said plurality of customers, and updating in each iteration a target reward value for each state-action pair.

17. A programmable storage medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus to perform a method for sequential decision-making method for customer relationship management, said method comprising:
- providing customer data comprising stimulus-response history data for a plurality of customers, said stimulus response history data being derived from event data for said customers;
  
  automatically generating actionable rules for optimizing a sequence of decisions over a period of time based on said stimulus-response history data;
  
  estimating a value function using batch reinforcement learning with function approximation, said function approximation representing the value function as a function of state features and actions; and
  
  transforming an output of the value function estimation into said actionable rules,wherein the estimating of the value function comprises;
  
  estimating a function approximation of the value function of a Markov Decision Process underlying said stimulus-response history data for the plurality of customers used as training data; and
  
  iteratively applying a regression model to the training data which comprises sequences of states, actions and rewards resulting for said plurality of customers, and updating in each iteration a target reward value for each state-action pair.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Abe, Naoki, Pednault, Edwin P. D.
Primary Examiner(s)
Sterrett, Jonathan G

Application Number

US12/140,846
Publication Number

US 20080249844A1
Time in Patent Office

1,575 Days
Field of Search

705/10, 705/14, 705/7.31
US Class Current

705/7.31
CPC Class Codes

G06Q 10/06375   Prediction of business proc...

G06Q 30/02   Marketing; Price estimation...

G06Q 30/0201   Market modelling; Market an...

G06Q 30/0202   Market predictions or forec...

G06Q 30/0204   Market segmentation

G06Q 30/0226   Incentive systems for frequ...

G06Q 30/0234   Rebates after completed pur...

G06Q 30/0239   Online discounts or incentives

System and method for sequential decision making for customer relationship management

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

17 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for sequential decision making for customer relationship management

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

17 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links