×

System and method for defining and calibrating a sequential decision problem using historical data

  • US 10,546,248 B2
  • Filed: 12/29/2015
  • Issued: 01/28/2020
  • Est. Priority Date: 12/31/2014
  • Status: Active Grant
First Claim
Patent Images

1. A computer-aided decision making system, comprising:

  • a user input device;

    a user output device; and

    a processor programmed to evaluate decision problems available to a user, the programmed processor;

    (A) facilitating input of a historical data set from a decision maker via the user input device;

    (B) the programmed processor defining a decision problem to be solved, the decision problem defined by parameters generated using statistical techniques on the historical data set, the parameters including;

    (i) an action set, the action set has elements representing actions available to a subject and action costs to the subject of performing the actions,(ii) at least one state dimension representing conditions relevant to the subject of the decision problem,(iii) a reward set representing rewards received by the user when transitioning between states for actions in the action set,(iv) each state dimension having a corresponding transition matrix containing a probability of moving between the states for actions in the action set,(v) a time index and a discount factor, the time index containing decision points available to the subject where the subject selects an action from the action set, and the discount factor representing the subject'"'"'s preference for rewards relative to time,(C) the programmed processor combining the reward set with the action costs to form a reward matrix and the programmed processor combining the transition matrices with the action set to form a total transition matrix;

    (D) the programmed processor forming a functional equation from the state dimensions, the reward matrix, the total transition matrix, and the time index and the discount factor;

    (E) the programmed processor evaluating the functional equation, including error-checking and validating the parameters and performing a convergence check to ensure that the functional equation will be solvable, and the programmed processor solving the functional equation;

    (F) the programmed processor generating an optimal policy by using the solved functional equation to find, for every point in the time index, an overall value-maximizing action;

    (G) the programmed processor outputting the optimal policy to the user through the user output device.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×