×

METHOD FOR PERFORMING A PLURALITY OF CANDIDATE ACTIONS AND MONITORING THE RESPONSES SO AS TO CHOOSE THE NEXT CANDIDATE ACTION TO TAKE TO CONTROL A SYSTEM SO AS TO OPTIMALLY CONTROL ITS OBJECTIVE FUNCTION

  • US 20080004940A1
  • Filed: 06/19/2007
  • Published: 01/03/2008
  • Est. Priority Date: 03/07/2001
  • Status: Active Grant
First Claim
Patent Images

1. A method of controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:

  • a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;

    b) storing, according to the candidate action performed by the system, a representation of said monitored response performance, wherein the representation of said monitored response performance includes at least one variable that characterizes conditions under which the respective candidate action was performed, and wherein said one or more variables are known before a next candidate action is chosen;

    c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;

    d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;

    e) commanding the system to perform the chosen next action; and

    f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×