System and method for defining and calibrating a sequential decision problem using historical data
First Claim
Patent Images
1. A computer-aided decision making system, comprising:
- a user input device;
a user output device; and
a processor programmed to evaluate decision problems available to a user, the programmed processor;
(A) facilitating input of a historical data set from a decision maker via the user input device;
(B) the programmed processor defining a decision problem to be solved, the decision problem defined by parameters generated using statistical techniques on the historical data set, the parameters including;
(i) an action set, the action set has elements representing actions available to a subject and action costs to the subject of performing the actions,(ii) at least one state dimension representing conditions relevant to the subject of the decision problem,(iii) a reward set representing rewards received by the user when transitioning between states for actions in the action set,(iv) each state dimension having a corresponding transition matrix containing a probability of moving between the states for actions in the action set,(v) a time index and a discount factor, the time index containing decision points available to the subject where the subject selects an action from the action set, and the discount factor representing the subject'"'"'s preference for rewards relative to time,(C) the programmed processor combining the reward set with the action costs to form a reward matrix and the programmed processor combining the transition matrices with the action set to form a total transition matrix;
(D) the programmed processor forming a functional equation from the state dimensions, the reward matrix, the total transition matrix, and the time index and the discount factor;
(E) the programmed processor evaluating the functional equation, including error-checking and validating the parameters and performing a convergence check to ensure that the functional equation will be solvable, and the programmed processor solving the functional equation;
(F) the programmed processor generating an optimal policy by using the solved functional equation to find, for every point in the time index, an overall value-maximizing action;
(G) the programmed processor outputting the optimal policy to the user through the user output device.
0 Assignments
0 Petitions
Accused Products
Abstract
A system and method for defining and calibrating the inputs to a sequential decision problem using historical data, where the user provides historical data and the system and method forms the historical data (along with other inputs) into at least one of the states, actions, rewards or transitions used in composing and solving the sequential decision problem.
6 Citations
20 Claims
-
1. A computer-aided decision making system, comprising:
-
a user input device;
a user output device; and
a processor programmed to evaluate decision problems available to a user, the programmed processor;(A) facilitating input of a historical data set from a decision maker via the user input device; (B) the programmed processor defining a decision problem to be solved, the decision problem defined by parameters generated using statistical techniques on the historical data set, the parameters including; (i) an action set, the action set has elements representing actions available to a subject and action costs to the subject of performing the actions, (ii) at least one state dimension representing conditions relevant to the subject of the decision problem, (iii) a reward set representing rewards received by the user when transitioning between states for actions in the action set, (iv) each state dimension having a corresponding transition matrix containing a probability of moving between the states for actions in the action set, (v) a time index and a discount factor, the time index containing decision points available to the subject where the subject selects an action from the action set, and the discount factor representing the subject'"'"'s preference for rewards relative to time, (C) the programmed processor combining the reward set with the action costs to form a reward matrix and the programmed processor combining the transition matrices with the action set to form a total transition matrix; (D) the programmed processor forming a functional equation from the state dimensions, the reward matrix, the total transition matrix, and the time index and the discount factor; (E) the programmed processor evaluating the functional equation, including error-checking and validating the parameters and performing a convergence check to ensure that the functional equation will be solvable, and the programmed processor solving the functional equation; (F) the programmed processor generating an optimal policy by using the solved functional equation to find, for every point in the time index, an overall value-maximizing action; (G) the programmed processor outputting the optimal policy to the user through the user output device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer implemented method for assisting a user in making a decision comprising:
-
providing a computer system having a user input device, a user output device and a processor programmed with instructions to evaluate decision problems available to the user, the instructions programming the processor and; (A) using the computer system to facilitate input of a historical data set from a decision maker via the user input device; (B) defining a decision problem to be solved, the decision problem defined by parameters generated using statistical techniques on the historical data set, the parameters including; (i) an action set, the action set has elements representing actions available to a subject and action costs to the subject of performing the actions, (ii) at least one state dimension representing conditions relevant to the subject of the decision problem, each state dimension has elements representing values of a condition relevant to the subject of the decision problem, (iii) a reward set representing rewards received by the user when transitioning between states for each action in the action set, (iv) each state dimension having a corresponding transition matrix containing a probability of moving between the states for actions in the action set, (v) a time index and a discount factor, the time index containing decision points available to the subject where the subject selects an action from the action set, and the discount factor representing the subject'"'"'s preference for rewards relative to time, (C) combining the reward set with the action costs to form a reward matrix and combining the transition matrices with the action set to form a total transition matrix; (D) forming a functional equation from the state dimensions, the reward matrix, the total transition matrix, and the time index and the discount factor; (E) evaluating the functional equation, including error-checking and validating the parameters and performing a convergence check to ensure that the functional equation will be solvable, and the programmed processor solving the functional equation; (F) generating an optimal policy by using the solved functional equation to find, for every point in the time index, an overall value-maximizing action; (G) outputting the optimal policy to the user through the user output device. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification