×

System and method for projecting a likely path of the subject of a sequential decision problem

  • US 10,460,249 B2
  • Filed: 09/30/2015
  • Issued: 10/29/2019
  • Est. Priority Date: 09/30/2015
  • Status: Active Grant
First Claim
Patent Images

1. A computer-aided decision making system, comprising:

  • a user input device;

    a user output device; and

    a processor programmed to evaluate decision problems available to a user;

    the programmed processor;

    (A) facilitating input of information from the user via the user input device, the information including(i) a decision problem to be solved, the decision problem to be solved defined by(ii) an action set, the action set has elements representing actions available to a subject, each element in the action set having a corresponding action cost, the corresponding action costs forming an action cost set,(iii) at least one state dimension representing conditions relevant to the subject of the decision problem, each state dimension has elements representing values of a condition relevant to the decision problem,(iv) each state dimension having a corresponding reward vector representing a reward to the subject associated with the elements of the state dimension, before consideration of the action cost set,(v) each state dimension having a corresponding transition matrix containing, for each element in the state dimension, a probability of moving from each state in the state dimension to each state in the state dimension for each action in the action set,(vi) a time index and a discount factor, the time index containing decision points available to the user, each decision point representing a point in time when the user selects from the action set, and the discount factor representing a subject'"'"'s preference for rewards relative to time,(B) the programmed processor combining the reward vectors with the action cost set to form a reward matrix and the programmed processor combining the transition matrices with the action set to form a total transition matrix;

    (C) the programmed processor forming a functional equation from the at least one state dimension, the reward matrix, the total transition matrix, and the time index and the discount factor;

    (D) the programmed processor evaluating the functional equation, including error-checking and validating the inputs and performing a convergence check to ensure that the functional equation will be solvable, and the programmed processor solving the functional equation;

    (E) the programmed processor generating an optimal policy by using the solved functional equation to find, for every point in the time index, an overall value-maximizing action;

    (F) the programmed processor generating at least one projected path beginning at a starting state by(i) identifying a set of assumed actions by selecting the value-maximizing action for each potential state at an initial point in the time index, based upon the optimal policy(ii) evaluating, for the assumed action, a transition to occur by comparing the probabilities in the total transition matrix for the combination of state dimensions;

    (iii) generating the projected path for each decision point in the time index by selecting the transition from the possible transitions at each decision point based upon the current state, the reward in the current state given the assumed action and the transition at the next decision point in the decision index, where the selection is based on the transition probabilities, the decision advice, and the reward matrix;

    (G) the programmed processor outputting the projected path to the user through the user output device.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×