System and method for defining and calibrating a sequential decision problem using historical data

US 10,546,248 B2
Filed: 12/29/2015
Issued: 01/28/2020
Est. Priority Date: 12/31/2014
Status: Active Grant

First Claim

Patent Images

1. A computer-aided decision making system, comprising:

a user input device;

a user output device; and

a processor programmed to evaluate decision problems available to a user, the programmed processor;

(A) facilitating input of a historical data set from a decision maker via the user input device;

(B) the programmed processor defining a decision problem to be solved, the decision problem defined by parameters generated using statistical techniques on the historical data set, the parameters including;

(i) an action set, the action set has elements representing actions available to a subject and action costs to the subject of performing the actions,(ii) at least one state dimension representing conditions relevant to the subject of the decision problem,(iii) a reward set representing rewards received by the user when transitioning between states for actions in the action set,(iv) each state dimension having a corresponding transition matrix containing a probability of moving between the states for actions in the action set,(v) a time index and a discount factor, the time index containing decision points available to the subject where the subject selects an action from the action set, and the discount factor representing the subject'"'"'s preference for rewards relative to time,(C) the programmed processor combining the reward set with the action costs to form a reward matrix and the programmed processor combining the transition matrices with the action set to form a total transition matrix;

(D) the programmed processor forming a functional equation from the state dimensions, the reward matrix, the total transition matrix, and the time index and the discount factor;

(E) the programmed processor evaluating the functional equation, including error-checking and validating the parameters and performing a convergence check to ensure that the functional equation will be solvable, and the programmed processor solving the functional equation;

(F) the programmed processor generating an optimal policy by using the solved functional equation to find, for every point in the time index, an overall value-maximizing action;

(G) the programmed processor outputting the optimal policy to the user through the user output device.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for defining and calibrating the inputs to a sequential decision problem using historical data, where the user provides historical data and the system and method forms the historical data (along with other inputs) into at least one of the states, actions, rewards or transitions used in composing and solving the sequential decision problem.

6 Citations

View as Search Results

20 Claims

1. A computer-aided decision making system, comprising:
- a user input device;
  
  a user output device; and
  
  a processor programmed to evaluate decision problems available to a user, the programmed processor;
  
  (A) facilitating input of a historical data set from a decision maker via the user input device;
  
  (B) the programmed processor defining a decision problem to be solved, the decision problem defined by parameters generated using statistical techniques on the historical data set, the parameters including;
  
  (i) an action set, the action set has elements representing actions available to a subject and action costs to the subject of performing the actions,(ii) at least one state dimension representing conditions relevant to the subject of the decision problem,(iii) a reward set representing rewards received by the user when transitioning between states for actions in the action set,(iv) each state dimension having a corresponding transition matrix containing a probability of moving between the states for actions in the action set,(v) a time index and a discount factor, the time index containing decision points available to the subject where the subject selects an action from the action set, and the discount factor representing the subject'"'"'s preference for rewards relative to time,(C) the programmed processor combining the reward set with the action costs to form a reward matrix and the programmed processor combining the transition matrices with the action set to form a total transition matrix;
  
  (D) the programmed processor forming a functional equation from the state dimensions, the reward matrix, the total transition matrix, and the time index and the discount factor;
  
  (E) the programmed processor evaluating the functional equation, including error-checking and validating the parameters and performing a convergence check to ensure that the functional equation will be solvable, and the programmed processor solving the functional equation;
  
  (F) the programmed processor generating an optimal policy by using the solved functional equation to find, for every point in the time index, an overall value-maximizing action;
  
  (G) the programmed processor outputting the optimal policy to the user through the user output device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. A computer-aided decision making system according to claim 1, wherein the programmed processor generates the at least one state dimension, the action costs, the time index and the discount factor using at least one of:
    - K-means, K nearest neighbors, or hierarchical clustering, or Bayes or Naive-Bayes classification.
  - 3. A computer-aided decision making system according to claim 1, wherein the programmed processor receives the historical data and, before defining the decision problem by generating the parameters, at least one additional input of:
    - an action, a state, a discount factor, a decision point, a reward for the reward set, an element of the transition matrix, or an action cost, the programmed processor generating all of the parameters not received as additional input and including the additional input in the statistical techniques.
  - 4. A computer-aided decision making system according to claim 1, wherein the programmed processor receives the historical data and the at least one state dimension, the action set, the action costs, the discount factor, the time index and the programmed processor uses the historical data to generate the reward set and the transition matrices.
  - 5. A computer-aided decision making system according to claim 4, wherein the programmed processor uses the historical data to generate the parameters including modifying at least one of the at least one state dimension, the action set, the action costs, the discount factor or the time index.
  - 6. A computer-aided decision making system according to claim 4, wherein the programmed processor receives the historical data and at least one reward for the reward set or one element of a transition matrix and uses the historical data and all of the elements not received as additional input and including the additional input in the statistical techniques to generate the reward set and the set of transition matrices.
  - 7. A computer-aided decision making system according to claim 1, wherein the programmed processor prompts the user to review and edit at least one of the parameters of the decision problem generated from the historical data and forming the functional equation, the programmed processor allowing the user to re-review and re-edit at least one of the historical data or the parameters after forming the functional equation, re-forming the functional equation when an edit is made.
  - 8. A computer-aided decision making system according to claim 4, wherein the programmed processor prompts the user to review and edit at least one of the parameters of the decision problem generated from the historical data and forming the functional equation, the programmed processor allowing the user to re-review and re-edit at least one of the historical data or the parameters after forming the functional equation, re-forming the functional equation when an edit is made.
  - 9. A computer-aided decision making system according to claim 1, wherein the programmed processor prompts the user to review and edit at least one of the parameters of the decision problem generated from the historical data after viewing the optimal policy and the programmed processor reforming and solving the edited decision problem for the user.
  - 10. A computer-aided decision making system according to claim 4, wherein the programmed processor prompts the user to review and edit at least one of the parameters of the decision problem generated from the historical data after viewing the optimal policy and the programmed processor reforming and solving the edited decision problem for the user.

11. A computer implemented method for assisting a user in making a decision comprising:
- providing a computer system having a user input device, a user output device and a processor programmed with instructions to evaluate decision problems available to the user, the instructions programming the processor and;
  
  (A) using the computer system to facilitate input of a historical data set from a decision maker via the user input device;
  
  (B) defining a decision problem to be solved, the decision problem defined by parameters generated using statistical techniques on the historical data set, the parameters including;
  
  (i) an action set, the action set has elements representing actions available to a subject and action costs to the subject of performing the actions,(ii) at least one state dimension representing conditions relevant to the subject of the decision problem, each state dimension has elements representing values of a condition relevant to the subject of the decision problem,(iii) a reward set representing rewards received by the user when transitioning between states for each action in the action set,(iv) each state dimension having a corresponding transition matrix containing a probability of moving between the states for actions in the action set,(v) a time index and a discount factor, the time index containing decision points available to the subject where the subject selects an action from the action set, and the discount factor representing the subject'"'"'s preference for rewards relative to time,(C) combining the reward set with the action costs to form a reward matrix and combining the transition matrices with the action set to form a total transition matrix;
  
  (D) forming a functional equation from the state dimensions, the reward matrix, the total transition matrix, and the time index and the discount factor;
  
  (E) evaluating the functional equation, including error-checking and validating the parameters and performing a convergence check to ensure that the functional equation will be solvable, and the programmed processor solving the functional equation;
  
  (F) generating an optimal policy by using the solved functional equation to find, for every point in the time index, an overall value-maximizing action;
  
  (G) outputting the optimal policy to the user through the user output device.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. A method as set forth in claim 11, wherein the step of generating the at least one state dimension, the action costs, the time index and the discount factor using at least one of:
    - K-means, K nearest neighbors, or hierarchical clustering, or Bayes or Naive-Bayes classification.
  - 13. A method as set forth in claim 11, wherein the step of receiving the historical data and, before defining the decision problem by generating the parameters, at least one additional input of:
    - an action, a state, a discount factor, a decision point, a reward for the reward set, an element of the transition matrix, or an action cost, the programmed processor generating all of the parameters not received as additional input and including the additional input in the statistical techniques.
  - 14. A method as set forth in claim 11, wherein the step of receiving the historical data also includes the at least one state dimension, the action set, the action costs, the discount factor, the time index and using the historical data to generate the reward set and the transition matrices.
  - 15. A method as set forth in claim 14, wherein the step of using the historical data to generate the parameters further includes modifying at least one of the at least one state dimension, the action set, the action costs, the discount factor or the time index.
  - 16. A method as set forth in claim 14, wherein the step of receiving the historical data further includes at least one reward for the reward set or one element of a transition matrix and using the historical data and all of the elements not received as additional input and including the additional input in the statistical techniques to generate the reward set and the set of transition matrices.
  - 17. A method as set forth in claim 11, wherein an additional step prompts the user to review and edit at least one of the parameters of the decision problem generated from the historical data and forming the functional equation and further includes the step of allowing the user to re-review and re-edit at least one of the historical data or the parameters after forming the functional equation, re-forming the functional equation when an edit is made.
  - 18. A method as set forth in claim 14 wherein an additional step prompts the user to review and edit at least one of the parameters of the decision problem generated from the historical data and forming the functional equation and further includes the step of allowing the user to re-review and re-edit at least one of the historical data or the parameters after forming the functional equation, re-forming the functional equation when an edit is made.
  - 19. A method as set forth in claim 11 wherein an additional step prompts the user to review and edit at least one of the parameters of the decision problem generated from the historical data after viewing the optimal policy and reforming and solving the edited decision problem for the user.
  - 20. A method as set forth in claim 14 wherein the programmed processor prompts the user to review and edit at least one of the parameters of the decision problem generated from the historical data after viewing the optimal policy and reforming and solving the edited decision problem for the user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Supported Intelligence LLC
Original Assignee
Supported Intelligence LLC
Inventors
Johnson, Jeffrey P, Anderson, Neal P
Primary Examiner(s)
Cassity, Robert A
Assistant Examiner(s)
Bejcek, II, Robert

Application Number

US14/982,382
Publication Number

US 20160196492A1
Time in Patent Office

1,491 Days
Field of Search
US Class Current
CPC Class Codes

G06N 20/00 Machine learning

G06N 5/02 Knowledge representation; S...

System and method for defining and calibrating a sequential decision problem using historical data

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

6 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for defining and calibrating a sequential decision problem using historical data

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

6 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links