×

Thompson strategy based online reinforcement learning system for action selection

  • US 7,707,131 B2
  • Filed: 06/29/2005
  • Issued: 04/27/2010
  • Est. Priority Date: 03/08/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. An online reinforcement learning system comprising components embodied on a computer readable storage medium, the components when executed by one or more processors, update a model based upon reinforcement learning, the components comprising:

  • a model comprising an influence diagram with at least one chance node, the model receiving an input and providing a probability distribution associated with uncertainty regarding parameters of the model;

    a decision engine that selects an action based, at least in part, upon the probability distribution, the decision engine employing a Thompson strategy heuristic technique to maximize long term expected utility when selecting the action, wherein the decision engine decreases a variance of a distribution of the parameters as a last decision instance is approached; and

    a computer-implemented reinforcement learning component that modifies at least one of the parameters of the model based upon feedback associated with the selected action, the parameters defining distributions over discrete variables and continuous variables, uncertainty of the parameters expressed using Dirichlet priors for conditional distributions of discrete variables of the model, and, Normal-Wishart priors for distributions of continuous variables of the model, wherein the modified model is stored.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×