System and method for sequential decision making for customer relationship management

US 7,403,904 B2
Filed: 07/19/2002
Issued: 07/22/2008
Est. Priority Date: 07/19/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A method for sequential decision making for customer relationship management, comprising:

providing customer data comprising stimulus-response history data for a population of customers, said stimulus-response history data being derived from event data for said customers;

automatically generating actionable rules for optimizing a sequence of decisions over a period of time based on said stimulus-response history data;

estimating a value function using batch reinforcement learning with function approximation, said function approximation representing the value function as a function of state features and actions, and said estimating a value function using batch reinforcement learning with function approximation comprising;

estimating a function approximation of the value function of a Markov Decision Process underlying said stimulus-response history data for said population of customers; and

iteratively applying a regression model to training data comprising sequences of states, actions and rewards resulting for said population of customers, and updating in each iteration a target reward value for each state-action pair; and

transforming an output of a value function estimation into said actionable rules, the rules specifying what actions to take given a set of feature values corresponding to a customer, and the action taken corresponding to an action having an approximate maximum value according to said value function for the given set of feature values.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for sequential decision-making for customer relationship management includes providing customer data including stimulus-response history data, and automatically generating actionable rules based on the customer data. Further, automatically generating actionable rules may include estimating a value function using reinforcement learning.

61 Citations

View as Search Results

31 Claims

1. A method for sequential decision making for customer relationship management, comprising:
- providing customer data comprising stimulus-response history data for a population of customers, said stimulus-response history data being derived from event data for said customers;
  
  automatically generating actionable rules for optimizing a sequence of decisions over a period of time based on said stimulus-response history data;
  
  estimating a value function using batch reinforcement learning with function approximation, said function approximation representing the value function as a function of state features and actions, and said estimating a value function using batch reinforcement learning with function approximation comprising;
  
  estimating a function approximation of the value function of a Markov Decision Process underlying said stimulus-response history data for said population of customers; and
  
  iteratively applying a regression model to training data comprising sequences of states, actions and rewards resulting for said population of customers, and updating in each iteration a target reward value for each state-action pair; and
  
  transforming an output of a value function estimation into said actionable rules, the rules specifying what actions to take given a set of feature values corresponding to a customer, and the action taken corresponding to an action having an approximate maximum value according to said value function for the given set of feature values.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 30, 31)
- - 2. The method according to claim 1, wherein said reinforcement learning comprises maximizing a net present value of profits with respect to a customer.
  - 3. The method according to claim 1, wherein said reinforcement learning maximizes a cumulative reward, R, given by
  - 4. The method according to claim 3, wherein y comprises a discount factor for calculating a net present value of future rewards based on a given interest rate.
  - 5. The method according to claim 1, wherein said batch reinforcement learning comprises batch reinforcement learning with function approximation based on one of Q-learning and sarsa-learning.
  - 6. The method according to claim 1, wherein said customer data comprises one of consumer data, client data and donor data.
  - 7. The method according to claim 1,wherein said transforming comprises generating training data using a model of said value function, said training data comprising a set of feature values and an action that corresponds to a maximum value for said set of feature values, and outputting a set of rules that are based on said training data;
    - andwherein said set of rules are generated using one of a classifier and a supervised learning tool.
  - 8. The method according to claim 1, wherein said providing customer data comprises selecting customer data.
  - 9. The method according to claim 8, wherein a value estimation repeatedly calls a data selection module one or more times per each iteration of a value iteration.
  - 10. The method according to claim 9, wherein said value estimation repeatedly calls said data selection module by using a selection criterion to select a restricted subset for use in a value function estimation.
  - 11. The method according to claim 10, wherein said using a selection criterion comprises selecting only those events such that their “
    - next”
      
      states satisfy a condition that an action taken corresponds to an action with a maximum estimated cumulative value, with respect to an estimated value function from a previous value iteration.
  - 12. The method according to claim 10, wherein said using a selection criteria comprises selecting only those events such that their next n states satisfy a condition that an action taken in each state corresponds to an action with a maximum estimated cumulative value.
  - 13. The method according to claim 10, wherein said using a selection criterion comprises selecting only those events such that their current states satisfy a condition that an action taken correspond to an action with a maximum estimated cumulative value, with respect to an estimated value function from a previous value iteration.
  - 14. The method according to claim 10, wherein said using a selection criterion comprises selecting only those events such that their current and the next n states satisfy a condition that an action taken correspond to an action with a maximum estimated cumulative value, with respect to an estimated value function from a previous value iteration.
  - 15. The method according to claim 10, wherein said using a selection criterion comprises selecting only those events such that their current states satisfy a condition that an action taken corresponds to an action with a maximum upper bound of estimated cumulative value with respect to an estimated value function from a previous value iteration.
  - 16. The method according to claim 14, wherein an n-step look ahead is used in a value function update procedure within a value estimation module, in combination with a selective sampling with n-step look ahead.
  - 17. The method according to claim 1, wherein said customer data comprises transaction history data from multiple channels.
  - 18. The method according to claim 17, wherein providing customer data comprises selecting customer data by accessing a number of channel specific databases, and forming an effective join of data using a form of customer identification as a key.
  - 19. The method according to claim 1, wherein said providing said customer data comprises generating a sequence of event data.
  - 20. The method according to claim 19, wherein said event data comprises a customer'"'"'s demographic features, and features, derived from said customer'"'"'s stimulus-response history data, which collectively reflect the state of said customer at a certain point in time, action taken at or around that time by an entity, a response taken by said customer at or around a time of an event, and an amount of profit realized by said entity and associated with said action.
  - 21. The method according to claim 20, wherein said customer comprises a consumer, said entity comprises a seller, and said action comprises a marketing action.
  - 30. The method according to claim 1, wherein said stimulus-response history data for each customer comprises a sequence of said customer'"'"'s demographic features and event features, at multiple time points, which collectively reflect the state of said customer at a point in time, an action taken at or around that time by an entity, a response taken by said customer at or around that time, and an amount of profit realized by said entity and associated with said action.
  - 31. The method according to claim 30, wherein said updating said target reward value for each state-action pair uses one of a look-ahead update formula of reinforcement learning, in terms of the reward value corresponding to said state-action pair in the training data, and an estimated value of the value function estimate from a previous iteration, for state-action pairs at a future time following said state-action pair.

22. A method of sequential targeted marketing for customer relationship management, comprising:
- preparing customer data comprising stimulus-response history data for a population of customers, said stimulus-response history data being derived from event data for said customers; and
  
  automatically generating actionable rules using said stimulus-response history data to output instance-in-time targeting rules for optimizing a sequence of decisions over a period of time, so as to approximately maximize expected cumulative profits over time;
  
  estimating a value function using batch reinforcement learning with function approximation, said function approximation representing the value function as a function of state features and actions, and said estimating said value function using batch reinforcement learning with function approximation comprising;
  
  estimating a function approximation of the value function of a Markov Decision Process underlying said stimulus-response history data for said population of customers; and
  
  iteratively applying a regression model to training data comprising sequences of states, actions and rewards resulting for said population of customers, and updating in each iteration a target reward value for each state-action pair; and
  
  transforming an output of a value function estimation into said actionable rules, the rules specifying what actions to take given a set of feature values corresponding to a customer, and the action taken corresponding to an action having an approximate maximum value according to said value function for the given set of feature values.

23. A method for sequential decision making for customer relationship management, comprising:
- providing a database of customer data comprising stimulus-response history data for a population of customers, said stimulus-response history data being derived from event data for said customers, from a plurality of channels;
  
  integrating said customer data; and
  
  automatically generating actionable channel-specific targeting rules for optimizing a sequence of decisions over a period of time based on said stimulus-response history data byestimating a value function using batch reinforcement learning with function approximation, said function approximation representing the value function as a function of state features and actions, and said estimating said value function using batch reinforcement learning with function approximation comprising comprises;
  
  estimating a function approximation of the value function of a Markov Decision Process underlying said stimulus-response history data for said population of customers; and
  
  iteratively applying a regression model to training data comprising sequences of states, actions and rewards resulting for said population of customers, and updating in each iteration a target reward value for each state-action pair; and
  
  transforming an output of a value function estimation into said actionable rules, the rules specifying what actions to take given a set of feature values corresponding to a customer, and the action taken corresponding to an action having an approximate maximum value according to said value function for the given set of feature values.
- View Dependent Claims (24)
- - 24. The method according to claim 23, wherein cross-channel cumulative profits are optimized.

25. A system for sequential decision making for customer relationship management, comprising:
- a database for storing customer data comprising stimulus-response history data for a population of customers, said stimulus-response history data being derived from event data for said customers; and
  
  a processor for automatically generating actionable rules for optimizing a sequence of decisions over a period of time based on said stimulus-response history data byestimating a value function using batch reinforcement learning with function approximation, said function approximation representing the value function as a function of state features and actions, and said estimating said value function using batch reinforcement learning with function approximation comprising;
  
  estimating a function approximation of the value function of a Markov Decision Process underlying said stimulus-response history data for said population of customers; and
  
  iteratively applying a regression model to training data comprising sequences of states, actions and rewards resulting for said population of customers, and updating in each iteration a target reward value for each state-action pair; and
  
  transforming an output of a value function estimation into said actionable rules, the rules specifying what actions to take given a set of feature values corresponding to a customer, and the action taken corresponding to an action having an approximate maximum value according to said value function for the given set of feature values.
- View Dependent Claims (26)
- - 26. The system according to claim 25, further comprising:
    - a customer profile cache,wherein when a targeting rule is applied, said customer profile cache is used to determine if an antecedent of a targeting rule is satisfied.

27. A system for sequential decision making for customer relationship management, comprising:
- a data preparation device for preparing customer data comprising stimulus-response history data;
  
  a value estimator for estimating a value function based on said stimulus-response history data for a population of customers, said stimulus-response history data being derived from event data for said customers; and
  
  a rule transformer for generating actionable rules for optimizing a sequence of decisions over a period of time based on said value function byestimating a value function using batch reinforcement learning with function approximation, said function approximation representing the value function as a function of state features and actions, and said estimating said value function using batch reinforcement learning with function approximation comprising;
  
  estimating a function approximation of the value function of a Markov Decision Process underlying said stimulus-response history data for said population of customers; and
  
  iteratively applying a regression model to training data comprising sequences of states, actions and rewards resulting for said population of customers, and updating in each iteration a target reward value for each state-action pair; and
  
  transforming an output of a value function estimation into said actionable rules, the rules specifying what actions to take given a set of feature values corresponding to a customer, and the action taken corresponding to an action having an approximate maximum value according to said value function for the given set of feature values.

28. A system for sequential cost-sensitive decision making for customer relationship management, comprising:
- a customer transaction cache for storing customer transaction data comprising stimulus-response history data for a population of customers, said stimulus-response history data being derived from event data for said customers;
  
  a customer profile cache for receiving an output of said customer transaction cache and storing current customer profile data; and
  
  a customer relationship management system, for receiving an output of said customer profile cache and customer relationship management rules for optimizing a sequence of decisions over a period of time, wherein said customer relationship management rules are automatically generated based on said stimulus-response history data byestimating a value function using batch reinforcement learning with function approximation, said function approximation representing the value function as a function of state features and actions, and said estimating said value function using batch reinforcement learning with function approximation comprising;
  
  estimating a function approximation of the value function of a Markov Decision Process underlying said stimulus-response history data for said population of customers; and
  
  iteratively applying a regression model to training data comprising sequences of states, actions and rewards resulting for said population of customers, and updating in each iteration a target reward value for each state-action pair; and
  
  transforming an output of a value function estimation into said actionable rules, the rules specifying what actions to take given a set of feature values corresponding to a customer, and the action taken corresponding to an action having an approximate maximum value according to said value function for the given set of feature values.

29. A programmable storage medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus to perform a method for sequential decision-making method for customer relationship management, said method comprising:
- providing customer data comprising stimulus-response history data for a population of customers, said stimulus-response history data being derived from event data for said customers; and
  
  automatically generating actionable rules for optimizing a sequence of decisions over a period of time based on said stimulus-response history data byestimating a value function using batch reinforcement learning with function approximation, said function approximation representing the value function as a function of state features and actions, and said estimating a value function using batch reinforcement learning with function approximation comprising;
  
  estimating a function approximation of the value function of a Markov Decision Process underlying said stimulus-response history data for said population of customers; and
  
  iteratively applying a regression model to training data comprising sequences of states, actions and rewards resulting for said population of customers, and updating in each iteration a target reward value for each state-action pair; and
  
  transforming an output of a value function estimation into said actionable rules, the rules specifying what actions to take given a set of feature values corresponding to a customer, and the action taken corresponding to an action having an approximate maximum value according to said value function for the given set of feature values.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Abe, Naoki, Pednault, Edwin P. D.
Primary Examiner(s)
Jeanty; Romain
Assistant Examiner(s)
ROBERTSON, DAVID

Application Number

US10/198,102
Publication Number

US 20040015386A1
Time in Patent Office

2,195 Days
Field of Search

None
US Class Current

705/7.29
CPC Class Codes

G06Q 10/06375   Prediction of business proc...

G06Q 30/02   Marketing; Price estimation...

G06Q 30/0201   Market modelling; Market an...

G06Q 30/0202   Market predictions or forec...

G06Q 30/0204   Market segmentation

G06Q 30/0226   Incentive systems for frequ...

G06Q 30/0234   Rebates after completed pur...

G06Q 30/0239   Online discounts or incentives

System and method for sequential decision making for customer relationship management

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

61 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for sequential decision making for customer relationship management

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

61 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links