Method for performing a plurality of candidate actions and monitoring the responses so as to choose the next candidate action to take to control a system so as to optimally control its objective function

US 7,260,551 B2
Filed: 03/22/2001
Issued: 08/21/2007
Est. Priority Date: 03/07/2001
Status: Active Grant

First Claim

Patent Images

1. A method of controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:

a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;

b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;

c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;

d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;

e) commanding the system to perform the chosen next action; and

f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present disclosure relates to a controller for controlling a system, capable of presentation of a plurality of candidate propositions resulting in a response performance, in order to optimise an objective function of the system. The controller has a means for storing, according to candidate proposition, a representation of the response performance in actual use of respective propositions; means for assessing which candidate proposition is likely to result in the lowest expected regret after the next presentation on the basis of an understanding of the probability distribution of the response performance of all of the plurality of candidate propositions; where regret is a term used for the shortfall in response performance between always presenting a true best candidate proposition and using the candidate proposition actually presented.

65 Citations

View as Search Results

15 Claims

1. A method of controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
- a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
  
  b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
  
  c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
  
  d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
  
  e) commanding the system to perform the chosen next action; and
  
  f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. A method according to claim 1 wherein step c) includes assessing which candidate action is likely to result in the lowest expected growth in regret on the basis of a true best candidate action which has the mean of said probability distribution.
  - 3. A method according to claim 1 wherein step c) includes evaluating the cost or losses associated with presenting a lower performing candidate action and the gain or benefit associated with knowing the true position of the current best observed candidate action on said probability distribution.
  - 4. A method according to claim 3 wherein step c) includes assessing which candidate action is likely to result in the lowest expected growth in regret according to an assumption that the current best observed candidate action is assumed to have zero uncertainty around its mean or expected response performance.
  - 5. A method according to claim 1 wherein step c) includes assessing which candidate action is likely to result in the lowest expected growth in regret according to an assumption of a Student'"'"'s distribution and evaluation of Student'"'"'s t parameters as the basis for estimating probabilities of unequal or equal response states between the candidate action with the current expected best response performance and any other candidate action.
  - 6. A method according to claim 1 wherein step c) includes using a Monte Carlo algorithm to provide understanding of the probability distribution of the response performance of all of the plurality of candidate actions and either choosing the candidate action that if not taken would contribute most to an expected regret estimate, or choosing a candidate action with probability proportional to its contribution to the expected regret estimate if not taken.
  - 7. A method according to claim 1 further comprising the step of:
    - g) applying a temporal depreciation factor to the stored representations of the response performance in order to depreciate the significance of the stored representations over time.
  - 8. A method according to claim 7 wherein step g) includes applying, for each candidate action, a different temporal depreciation factor to the stored representations of the response performance thereof.
  - 9. A method according to claim 1 further comprising the step of:
    - g) forcing the performance of each candidate action a minimum number of times or at a minimum rate.
  - 10. A method according to claim 1 wherein the monitored response performance of a respective candidate action in step a) is stored in step b) in a form to enable sharing of the stored representation of said monitored response performance with another system.
  - 11. A method according to claim 1 wherein the representation of said monitored response performance contains at least one variable that characterizes the conditions under which the candidate action was performed.

12. A system having means for performing a plurality of candidate actions and means for monitoring response performances of a performance of a respective candidate action during performance of an objective function of the system, where the objective function is a function of the monitored response performances following decisions and actions taken, the system further having a control apparatus that is programmed to control the objective function of the system by performing the method comprising the steps of:
- a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
  
  b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
  
  c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
  
  d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed;
  
  e) commanding the system to perform the chosen next action; and
  
  f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system.
- View Dependent Claims (13)
- - 13. A robot comprising the system according to claim 12, where the control apparatus of the system controls the objective function of the robot so as to optimize the objective function of the robot.

14. A control apparatus for controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the control apparatus comprisinga) means for monitoring response performance of a respective candidate action that is chosen to be performed by the system;
- b) means for storing, according to the candidate action performed by the system, a representation of said monitored response performance;
  
  c) means for calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
  
  d) means for choosing as the next action the candidate actions that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system; and
  
  e) means for commanding the system to perform the chosen next action,wherein the control apparatus controls the system so as to substantially optimize the objective function of the system.

15. A method of controlling a system with two or more subsystems to optimize an objective function of the system, the system performing a plurality of candidate actions, wherein a candidate action is represented by the selection of a lower level subsystem from said two or more subsystems, and wherein the system monitors the response performance of the selected subsystem, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
- a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
  
  b) storing, according to the candidate action performed by the system, a representation of said monitored subsystem performance in response to the candidate action;
  
  c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
  
  d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
  
  e) commanding the system to perform the chosen next action using a corresponding lower level subsystem; and
  
  f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adobe Inc.
Original Assignee
Omniture Incorporated (Adobe Inc.)
Inventors
Phillips, Alan Paul Rolleston
Primary Examiner(s)
Hafiz; Tariq R.
Assistant Examiner(s)
BOSWELL, BETH V

Application Number

US09/814,308
Publication Number

US 20030004777A1
Time in Patent Office

2,343 Days
Field of Search

705/7, 705/10, 705/14
US Class Current

705/7.28
CPC Class Codes

G06Q 10/04   Forecasting or optimisation...

G06Q 10/0635   Risk analysis of enterprise...

G06Q 10/06375   Prediction of business proc...

G06Q 10/0639   Performance analysis of emp...

G06Q 30/02   Marketing; Price estimation...

G06Q 30/0201   Market modelling; Market an...

G06Q 30/0202   Market predictions or forec...

Method for performing a plurality of candidate actions and monitoring the responses so as to choose the next candidate action to take to control a system so as to optimally control its objective function

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

65 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Method for performing a plurality of candidate actions and monitoring the responses so as to choose the next candidate action to take to control a system so as to optimally control its objective function

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

65 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links