System and method for projecting a likely path of the subject of a sequential decision problem
First Claim
Patent Images
1. A computer-aided decision making system, comprising:
- a user input device;
a user output device; and
a processor programmed to evaluate decision problems available to a user;
the programmed processor;
(A) facilitating input of information from the user via the user input device, the information including(i) a decision problem to be solved, the decision problem to be solved defined by(ii) an action set, the action set has elements representing actions available to a subject, each element in the action set having a corresponding action cost, the corresponding action costs forming an action cost set,(iii) at least one state dimension representing conditions relevant to the subject of the decision problem, each state dimension has elements representing values of a condition relevant to the decision problem,(iv) each state dimension having a corresponding reward vector representing a reward to the subject associated with the elements of the state dimension, before consideration of the action cost set,(v) each state dimension having a corresponding transition matrix containing, for each element in the state dimension, a probability of moving from each state in the state dimension to each state in the state dimension for each action in the action set,(vi) a time index and a discount factor, the time index containing decision points available to the user, each decision point representing a point in time when the user selects from the action set, and the discount factor representing a subject'"'"'s preference for rewards relative to time,(B) the programmed processor combining the reward vectors with the action cost set to form a reward matrix and the programmed processor combining the transition matrices with the action set to form a total transition matrix;
(C) the programmed processor forming a functional equation from the at least one state dimension, the reward matrix, the total transition matrix, and the time index and the discount factor;
(D) the programmed processor evaluating the functional equation, including error-checking and validating the inputs and performing a convergence check to ensure that the functional equation will be solvable, and the programmed processor solving the functional equation;
(E) the programmed processor generating an optimal policy by using the solved functional equation to find, for every point in the time index, an overall value-maximizing action;
(F) the programmed processor generating at least one projected path beginning at a starting state by(i) identifying a set of assumed actions by selecting the value-maximizing action for each potential state at an initial point in the time index, based upon the optimal policy(ii) evaluating, for the assumed action, a transition to occur by comparing the probabilities in the total transition matrix for the combination of state dimensions;
(iii) generating the projected path for each decision point in the time index by selecting the transition from the possible transitions at each decision point based upon the current state, the reward in the current state given the assumed action and the transition at the next decision point in the decision index, where the selection is based on the transition probabilities, the decision advice, and the reward matrix;
(G) the programmed processor outputting the projected path to the user through the user output device.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for projecting the likely future path of the subject of a sequential decision problem. The subject of the sequential decision problem takes an action beginning with the starting state of affairs and probabilistically transitions into other states according to the structure of the decision problem, the solution to the decision problem, possibly random events, and the decisions of the subject. The likely future path consists of a sequence of actions taken by the subject, the states the subject will likely be in after taking the projected actions, and the rewards the subject is likely to receive along the future path.
5 Citations
20 Claims
-
1. A computer-aided decision making system, comprising:
-
a user input device; a user output device; and a processor programmed to evaluate decision problems available to a user;
the programmed processor;(A) facilitating input of information from the user via the user input device, the information including (i) a decision problem to be solved, the decision problem to be solved defined by (ii) an action set, the action set has elements representing actions available to a subject, each element in the action set having a corresponding action cost, the corresponding action costs forming an action cost set, (iii) at least one state dimension representing conditions relevant to the subject of the decision problem, each state dimension has elements representing values of a condition relevant to the decision problem, (iv) each state dimension having a corresponding reward vector representing a reward to the subject associated with the elements of the state dimension, before consideration of the action cost set, (v) each state dimension having a corresponding transition matrix containing, for each element in the state dimension, a probability of moving from each state in the state dimension to each state in the state dimension for each action in the action set, (vi) a time index and a discount factor, the time index containing decision points available to the user, each decision point representing a point in time when the user selects from the action set, and the discount factor representing a subject'"'"'s preference for rewards relative to time, (B) the programmed processor combining the reward vectors with the action cost set to form a reward matrix and the programmed processor combining the transition matrices with the action set to form a total transition matrix; (C) the programmed processor forming a functional equation from the at least one state dimension, the reward matrix, the total transition matrix, and the time index and the discount factor; (D) the programmed processor evaluating the functional equation, including error-checking and validating the inputs and performing a convergence check to ensure that the functional equation will be solvable, and the programmed processor solving the functional equation; (E) the programmed processor generating an optimal policy by using the solved functional equation to find, for every point in the time index, an overall value-maximizing action; (F) the programmed processor generating at least one projected path beginning at a starting state by (i) identifying a set of assumed actions by selecting the value-maximizing action for each potential state at an initial point in the time index, based upon the optimal policy (ii) evaluating, for the assumed action, a transition to occur by comparing the probabilities in the total transition matrix for the combination of state dimensions; (iii) generating the projected path for each decision point in the time index by selecting the transition from the possible transitions at each decision point based upon the current state, the reward in the current state given the assumed action and the transition at the next decision point in the decision index, where the selection is based on the transition probabilities, the decision advice, and the reward matrix; (G) the programmed processor outputting the projected path to the user through the user output device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer implemented method for assisting a user in making a decision comprising:
-
providing a computer system having a user input device, a user output device, and a processor programmed with instructions to evaluate a decision problem available to the user, the instructions programming the processor; (A) using the computer system to provide the user with an option for selecting the decision problem to be solved, the user inputs information via the user input device to define the decision problem, the information including (i) a decision problem to be solved, the decision problem to be solved defined by (ii) an action set, the action set has elements representing actions available to a subject, each element in the action set having a corresponding action cost, the corresponding action costs forming an action cost set, (iii) at least one state dimension representing conditions relevant to the subject of the decision problem, each state dimension has elements representing values of a condition relevant to the decision problem, (iv) each state dimension having a corresponding reward vector representing a reward to the subject associated with the elements of the state dimension, before consideration of the action cost set, (v) each state dimension having a corresponding transition matrix containing, for each element in the state dimension, a probability of moving from each state in the state dimension to each state in the state dimension for each action in the action set, (vi) a time index and a discount factor, the time index containing decision points available to the user, each decision point representing a point in time when the user selects from the action set, and the discount factor representing a subject'"'"'s preference for rewards relative to time, (B) forming, by the computer system manipulating the reward vectors with the action cost set, a reward matrix, and by the computer system manipulating the transition matrices with the set of actions, a total transition matrix, (C) forming, by the computer system manipulating the at least one state dimension, the reward matrix, the total transition matrix and the time index and the discount factor, a functional equation, (D) evaluating, by the computer system, the functional equation including error-checking and validating the inputs and performing a convergence check to ensure that the functional equation will be solvable, and solving the functional equation; (E) generating, by the computer system, an optimal policy by using the solved functional equation to find, for every point in the time index, an overall value-maximizing action; (F) generating, by the computer system, at least one projected path beginning at a starting state by (i) identifying a set of assumed actions by selecting the value-maximizing action for each potential state at an initial point in the decision index, based upon the optimal policy (ii) evaluating, for the assumed action, a transition to occur by comparing the probabilities in the total transition matrix for the combination of state dimensions; (iii) generating the projected path for each decision point in the time index by selecting the transition from the possible transitions at each decision point based upon the current state, the reward in the current state given the assumed action and the transition at the next decision point in the decision index, where the selection is based on the transition probabilities, the decision advice, and the reward matrix; (G) outputting, by the computer system, the projected path to the user through the user output device. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification