METHOD FOR PERFORMING A PLURALITY OF CANDIDATE ACTIONS AND MONITORING THE RESPONSES SO AS TO CHOOSE THE NEXT CANDIDATE ACTION TO TAKE TO CONTROL A SYSTEM SO AS TO OPTIMALLY CONTROL ITS OBJECTIVE FUNCTION
First Claim
1. A method of controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
- a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance, wherein the representation of said monitored response performance includes at least one variable that characterizes conditions under which the respective candidate action was performed, and wherein said one or more variables are known before a next candidate action is chosen;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system.
2 Assignments
0 Petitions
Accused Products
Abstract
The present disclosure relates to a controller for controlling a system, capable of presentation of a plurality of candidate propositions resulting in a response performance, in order to optimise an objective function of the system. The controller has a means for storing, according to candidate proposition, a representation of the response performance in actual use of respective propositions; means for assessing which candidate proposition is likely to result in the lowest expected regret after the next presentation on the basis of an understanding of the probability distribution of the response performance of all of the plurality of candidate propositions; where regret is a term used for the shortfall in response performance between always presenting a true best candidate proposition and using the candidate proposition actually presented.
66 Citations
154 Claims
-
1. A method of controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance, wherein the representation of said monitored response performance includes at least one variable that characterizes conditions under which the respective candidate action was performed, and wherein said one or more variables are known before a next candidate action is chosen;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system having means for performing a plurality of candidate actions and means for monitoring response performances of a performance of a respective candidate action during performance of an objective function of the system, where the objective function is a function of the monitored response performances following decisions and actions taken, the system further having a control apparatus that is programmed to control the objective function of the system by performing the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance, wherein the representation of said monitored response performance includes at least one variable that characterizes conditions under which the respective candidate action was performed, and wherein said one or more variables are known before a next candidate action is chosen;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system. - View Dependent Claims (12)
-
-
13. A control apparatus for controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the control apparatus comprising:
-
a) means for monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) means for storing, according to the candidate action performed by the system, a representation of said monitored response performance, wherein the representation of said monitored response performance includes at least one variable that characterizes conditions under which the respective candidate action was performed, and wherein said one or more variables are known before a next candidate action is chosen;
c) means for calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) means for choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system; and
e) means for commanding the system to perform the chosen next action, wherein the control apparatus controls the system so as to substantially optimize the objective function of the system.
-
-
14. A method of controlling a system with two or more subsystems to optimize an objective function of the system, the system performing a plurality of candidate actions, wherein a candidate action is represented by the selection of a lower level subsystem from said two or more subsystems, and wherein the system monitors the response performance of the selected subsystem, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored subsystem performance in response to the candidate action, wherein the representation of said monitored subsystem performance includes at least one variable that characterizes conditions under which the respective candidate action was performed, and wherein said one or more variables are known before a next candidate action is chosen;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action using a corresponding lower level subsystem; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system.
-
-
15. A method of controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the shortfall in performance between taking the true best candidate action under conditions prevailing at the time and taking the candidate action actually taken, where the true best candidate action is the optimal action if one knew everything that could be known, and where this calculated shortfall in performance can also be considered to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate actions that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A system having means for performing a plurality of candidate actions and means for monitoring response performances of a performance of a respective candidate action during performance of an objective function of the system, where the objective function is a function of the monitored response performances following decisions and actions taken, the system further having a control apparatus that is programmed to control the objective function of the system by performing the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the shortfall in performance between taking the true best candidate action under conditions prevailing at the time and taking the candidate action actually taken, where the true best candidate action is the optimal action if one knew everything that could be known, and where this calculated shortfall in performance can also be considered to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system. - View Dependent Claims (28)
-
-
29. A control apparatus for controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the control apparatus comprising:
-
a) means for monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) means for storing, according to the candidate action performed by the system, a representation of said monitored response performance c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the shortfall in performance between taking the true best candidate action under conditions prevailing at the time and taking the candidate action actually taken, where the true best candidate action is the optimal action if one knew everything that could be known, and where this calculated shortfall in performance can also be considered to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) means for choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system; and
e) means for commanding the system to perform the chosen next action, wherein the control apparatus controls the system so as to substantially optimize the objective function of the system.
-
-
30. A method of controlling a system with two or more subsystems to optimize an objective function of the system, the system performing a plurality of candidate actions, wherein a candidate action is represented by the selection of a lower level subsystem from said two or more subsystems, and wherein the system monitors the response performance of the selected subsystem, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored subsystem performance in response to the candidate action;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the shortfall in performance between taking the true best candidate action under conditions prevailing at the time and taking the candidate action actually taken, where the true best candidate action is the optimal action if one knew everything that could be known, and where this calculated shortfall in performance can also be considered to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action using a corresponding lower level subsystem; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system.
-
-
31. A method of controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to restore a balance between first and second components of said regret, the first component being an estimated cost arising from exploring those apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, and the second component being an estimated loss arising from exploiting what appears to be the current best action, but which may in fact not be the current best action, based on said historical performances to date;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
-
-
43. A system having means for performing a plurality of candidate actions and means for monitoring response performances of a performance of a respective candidate action during performance of an objective function of the system, where the objective function is a function of the monitored response performances following decisions and actions taken, the system further having a control apparatus that is programmed to control the objective function of the system by performing the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to restore a balance between first and second components of said regret, the first component being an estimated cost arising from exploring those apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, and the second component being an estimated loss arising from exploiting what appears to be the current best action, but which may in fact not be the current best action, based on said historical performances to date;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system. - View Dependent Claims (44)
-
-
45. A control apparatus for controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the control apparatus comprising:
-
a) means for monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) means for storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) means for calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) means for choosing as the next action the candidate action that is calculated to restore a balance between first and second components of said regret, the first component being an estimated cost arising from exploring those apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, and the second component being an estimated loss arising from exploiting what appears to be the current best action, but which may in fact not be the current best action, based on said historical performances to date; and
e) means for commanding the system to perform the chosen next action, wherein the control apparatus controls the system so as to substantially optimize the objective function of the system.
-
-
46. A method of controlling a system with two or more subsystems to optimize an objective function of the system, the system performing a plurality of candidate actions, wherein a candidate action is represented by the selection of a lower level subsystem from said two or more subsystems, and wherein the system monitors the response performance of the selected subsystem, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored subsystem performance in response to the candidate action;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to restore a balance between first and second components of said regret, the first component being an estimated cost arising from exploring those apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, and the second component being an estimated loss arising from exploiting what appears to be the current best action, but which may in fact not be the current best action, based on said historical performances to date;
e) commanding the system to perform the chosen next action using a corresponding lower level subsystem; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system.
-
-
47. A method of controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action;
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system; and
g) applying a window qualification scheme to the stored representations of the response performance in order to assign higher weight to more recent performances, where the window is defined as either a fixed number of recent observations or as a fixed elapsed time period, and where those stored representations outside the window are excluded from the appraisal of candidate actions. - View Dependent Claims (48, 49, 50, 51, 52, 53, 54, 55, 56)
-
-
57. A system having means for performing a plurality of candidate actions and means for monitoring response performances of a performance of a respective candidate action during performance of an objective function of the system, where the objective function is a function of the monitored response performances following decisions and actions taken, the system further having a control apparatus that is programmed to control the objective function of the system by performing the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action;
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system; and
g) applying a window qualification scheme to the stored representations of the response performance in order to assign higher weight to more recent performances, where the window is defined as either a fixed number of recent observations or as a fixed elapsed time period, and where those stored representations outside the window are excluded from the appraisal of candidate actions. - View Dependent Claims (58)
-
-
59. A control apparatus for controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the control apparatus comprising:
-
a) means for monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) means for storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) means for calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) means for choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) means for commanding the system to perform the chosen next action, wherein the control apparatus controls the system so as to substantially optimize the objective function of the system; and
f) means for applying a window qualification scheme to the stored representations of the response performance in order to assign higher weight to more recent performances, where the window is defined as either a fixed number of recent observations or as a fixed elapsed time period, and where those stored representations outside the window are excluded from the appraisal of candidate actions.
-
-
60. A method of controlling a system with two or more subsystems to optimize an objective function of the system, the system performing a plurality of candidate actions, wherein a candidate action is represented by the selection of a lower level subsystem from said two or more subsystems, and wherein the system monitors the response performance of the selected subsystem, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored subsystem performance in response to the candidate action;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action using a corresponding lower level subsystem;
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system; and
g) applying a window qualification scheme to the stored representations of the response performance in order to assign higher weight to more recent performances, where the window is defined as either a fixed number of recent observations or as a fixed elapsed time period, and where those stored representations outside the window are excluded from the appraisal of candidate actions.
-
-
61. A method of controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, and where for each optimization decision instance there exists a ranked set of action opportunities which must be serviced in rank order, and for which a particular candidate action may not occur more than once within that given set of action opportunities, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of available candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the available candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action, f) repeating steps a) to e) until all action opportunities within the current ranked set of action opportunities have been serviced, and;
g) repeating steps a) to f) to control the system so as to substantially optimize the objective function of the system. - View Dependent Claims (62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)
-
-
73. A system having means for performing a plurality of candidate actions and means for monitoring response performances of a performance of a respective candidate action during performance of an objective function of the system, where the objective function is a function of the monitored response performances following decisions and actions taken, and where for each optimization decision instance there exists a ranked set of action opportunities which must be serviced in rank order, and for which a particular candidate action may not occur more than once within that given set of action opportunities, the system further having a control apparatus that is programmed to control the objective function of the system by performing the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of available candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the available candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action, f) repeating steps a) to e) until all action opportunities within the current ranked set of action opportunities have been serviced, and;
g) repeating steps a) to f) to control the system so as to substantially optimize the objective function of the system. - View Dependent Claims (74)
-
-
75. A control apparatus for controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, and where for each optimization decision instance there exists a ranked set of action opportunities which must be serviced in rank order, and for which a particular candidate action may not occur more than once within that given set of action opportunities, the control apparatus comprising:
-
a) means for monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) means for storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) means for calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) means for choosing as the next action the available candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system; and
e) means for commanding the system to perform the chosen next action, wherein the control apparatus controls the system so as to substantially optimize the objective function of the system.
-
-
76. A method of controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where each candidate action is represented by the presentation of a candidate marketing proposition on a web page, from an available set of candidate marketing propositions, and where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system. - View Dependent Claims (77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87)
-
-
88. A system having means for performing a plurality of candidate actions and means for monitoring response performances of a performance of a respective candidate action during performance of an objective function of the system, where each candidate action is represented by the presentation of a candidate marketing proposition on a web page, from an available set of candidate marketing propositions, and where the objective function is a function of the monitored response performances following decisions and actions taken, the system further having a control apparatus that is programmed to control the objective function of the system by performing the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system. - View Dependent Claims (89)
-
-
90. A control apparatus for controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where each candidate action is represented by the presentation of a candidate marketing proposition on a web page, from an available set of candidate marketing propositions, and where the objective function is a function of the monitored response performances following decisions and actions taken, the control apparatus comprising:
-
a) means for monitoring response performance of a respective candidate action that is chosen to be performed by the system;
b) means for storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) means for calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) means for choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system; and
e) means for commanding the system to perform the chosen next action, wherein the control apparatus controls the system so as to substantially optimize the objective function of the system.
-
-
91. A method of controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system, where the response performance is continuously monitored against one or more control groups that are each used to drive a fraction of decisions of the system, and where a control group activity may be represented by i) randomly selecting one of the available candidate actions (“
Random Subsystem or Control”
), ii) selecting a candidate action by ignoring any variables that are known to characterize and potentially discriminate one interaction scenario from another (“
Generalized Subsystem or Control”
), iii) selecting a candidate action based on all available data (“
Targeted Subsystem or Control”
), or iv) any other specific decision process which is desired to be used as a reference, where control data is used to inspect or compare the response performance across two or more modes of decision operation over any time period;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system. - View Dependent Claims (92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102)
-
-
103. A system having means for performing a plurality of candidate actions and means for monitoring response performances of a performance of a respective candidate action during performance of an objective function of the system, where the objective function is a function of the monitored response performances following decisions and actions taken, the system further having a control apparatus that is programmed to control the objective function of the system by performing the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system, where the response performance is continuously monitored against one or more control groups that are each used to drive a fraction of decisions of the system, and where a control group activity may be represented by i) randomly selecting one of the available candidate actions (“
Random Subsystem or Control”
), ii) selecting a candidate action by ignoring any variables that are known to characterize and potentially discriminate one interaction scenario from another (“
Generalized Subsystem or Control”
), iii) selecting a candidate action based on all available data (“
Targeted Subsystem or Control”
), or iv) any other specific decision process which is desired to be used as a reference, where control data is used to inspect or compare the response performance across two or more modes of decision operation over any time period;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system. - View Dependent Claims (104)
-
-
105. A control apparatus for controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the control apparatus comprising:
-
a) means for monitoring response performance of a respective candidate action that is chosen to be performed by the system, where the response performance is continuously monitored against one or more control groups that are each used to drive a fraction of decisions of the system, and where a control group activity may be represented by i) randomly selecting one of the available candidate actions (“
Random Subsystem or Control”
), ii) selecting a candidate action by ignoring any variables that are known to characterize and potentially discriminate one interaction scenario from another (“
Generalized Subsystem or Control”
), iii) selecting a candidate action based on all available data (“
Targeted Subsystem or Control”
), or iv) any other specific decision process which is desired to be used as a reference, where control data is used to inspect or compare the response performance across two or more modes of decision operation over any time period;
b) means for storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) means for calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) means for choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system; and
e) means for commanding the system to perform the chosen next action, wherein the control apparatus controls the system so as to substantially optimize the objective function of the system.
-
-
106. A method of controlling a system with two or more subsystems to optimize an objective function of the system, the system performing a plurality of candidate actions, wherein a candidate action is represented by the selection of a lower level subsystem from said two or more subsystems, and wherein the system monitors the response performance of the selected subsystem, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system, where the response performance is continuously monitored against one or more control groups that are each used to drive a fraction of decisions of the system, and where a control group activity may be represented by i) randomly selecting one of the available candidate actions (“
Random Subsystem or Control”
), ii) selecting a candidate action by ignoring any variables that are known to characterize and potentially discriminate one interaction scenario from another (“
Generalized Subsystem or Control”
), iii) selecting a candidate action based on all available data (“
Targeted Subsystem or Control”
), or iv) any other specific decision process which is desired to be used as a reference, where control data is used to inspect or compare the response performance across two or more modes of decision operation over any time period;
b) storing, according to the candidate action performed by the system, a representation of said monitored subsystem performance in response to the candidate action;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action using a corresponding lower level subsystem; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system.
-
-
107. A method of controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system, where the response performance is continuously monitored against one or more control groups and where reporting of relative performance of the control groups incorporates a test for statistical significance as a measure of confidence for any observations during monitoring;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system. - View Dependent Claims (108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118)
-
-
119. A system having means for performing a plurality of candidate actions and means for monitoring response performances of a performance of a respective candidate action during performance of an objective function of the system, where the objective function is a function of the monitored response performances following decisions and actions taken, the system further having a control apparatus that is programmed to control the objective function of the system by performing the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system, where the response performance is continuously monitored against one or more control groups and where reporting of relative performance of the control groups incorporates a test for statistical significance as a measure of confidence for any observations during monitoring;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system. - View Dependent Claims (120)
-
-
121. A control apparatus for controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the control apparatus comprising:
-
a) means for monitoring response performance of a respective candidate action that is chosen to be performed by the system, where the response performance is continuously monitored against one or more control groups and where reporting of relative performance of the control groups incorporates a test for statistical significance as a measure of confidence for any observations during monitoring;
b) means for storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) means for calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) means for choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system; and
e) means for commanding the system to perform the chosen next action, wherein the control apparatus controls the system so as to substantially optimize the objective function of the system.
-
-
122. A method of controlling a system with two or more subsystems to optimize an objective function of the system, the system performing a plurality of candidate actions, wherein a candidate action is represented by the selection of a lower level subsystem from said two or more subsystems, and wherein the system monitors the response performance of the selected subsystem, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system, where the response performance is continuously monitored against one or more control groups and where reporting of relative performance of the control groups incorporates a test for statistical significance as a measure of confidence for any observations during monitoring;
b) storing, according to the candidate action performed by the system, a representation of said monitored subsystem performance in response to the candidate action;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action using a corresponding lower level subsystem; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system.
-
-
123. A method of controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system, where the response performance is continuously monitored against response performance of one or more control groups;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system, wherein the size of a control group from the one or more control groups, and therefore any compromise to the system performance caused by running that control group is minimized by automatically regulating a fraction of decisions allocated to that control group using an algorithm incorporating an observed statistical significance of the observed difference in performance between the performance of system activities within that control group and the performance of system activities outside of that control group. - View Dependent Claims (124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134)
-
-
135. A system having means for performing a plurality of candidate actions and means for monitoring response performances of a performance of a respective candidate action during performance of an objective function of the system, where the objective function is a function of the monitored response performances following decisions and actions taken, the system further having a control apparatus that is programmed to control the objective function of the system by performing the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system, where the response performance is continuously monitored against response performance of one or more control groups;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system, wherein the size of a control group from the one or more control groups, and therefore any compromise to the system performance caused by running that control group, is minimized by automatically regulating a fraction of decisions allocated to that control group using an algorithm incorporating an observed statistical significance of the observed difference in performance between the performance of system activities within that control group and the performance of system activities outside of that control group. - View Dependent Claims (136)
-
-
137. A control apparatus for controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the control apparatus comprising:
-
a) means for monitoring response performance of a respective candidate action that is chosen to be performed by the system, where the response performance is continuously monitored against response performance of one or more control groups;
b) means for storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) means for calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) means for choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system; and
e) means for commanding the system to perform the chosen next action, wherein the control apparatus controls the system so as to substantially optimize the objective function of the system, wherein the size of a control group from the one or more control groups, and therefore any compromise to the system performance caused by running that control group, is minimized by automatically regulating a fraction of decisions allocated to that control group using an algorithm incorporating an observed statistical significance of the observed difference in performance between the performance of system activities within that control group and the performance of system activities outside of that control group.
-
-
138. A method of controlling a system with two or more subsystems to optimize an objective function of the system, the system performing a plurality of candidate actions, wherein a candidate action is represented by the selection of a lower level subsystem from said two or more subsystems, and wherein the system monitors the response performance of the selected subsystem, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system where the response performance is continuously monitored against response performance of one or more control groups;
b) storing, according to the candidate action performed by the system, a representation of said monitored subsystem performance in response to the candidate action;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action using a corresponding lower level subsystem; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system, wherein the size of a control group from the one or more control groups, and therefore any compromise to the system performance caused by running that control group, is minimized by automatically regulating a fraction of decisions allocated to that control group using an algorithm incorporating an observed statistical significance of the observed difference in performance between the performance of system activities within that control group and the performance of system activities outside of that control group.
-
-
139. A method of controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system, where the response performance is continuously monitored against response performance of a random control group;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system, wherein monitoring observations captured and recorded as part of the random control group are used in the estimation of growth of regret for the purposes of choosing the next candidate action for optimized decisions, and therefore the compromise to system performance caused by running the random control group is minimized by making full use of all available observations. - View Dependent Claims (140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150)
-
-
151. A system having means for performing a plurality of candidate actions and means for monitoring response performances of a performance of a respective candidate action during performance of an objective function of the system, where the objective function is a function of the monitored response performances following decisions and actions taken, the system further having a control apparatus that is programmed to control the objective function of the system by performing the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system, where the response performance is continuously monitored against response performance of a random control group;
b) storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system, wherein monitoring observations captured and recorded as part of the random control group are used in the estimation of growth of regret for the purposes of choosing the next candidate action for optimized decisions, and therefore the compromise to system performance caused by running the random control group is minimized by making full use of all available observations. - View Dependent Claims (152)
-
-
153. A control apparatus for controlling a system to optimize an objective function thereof, the system performing a plurality of candidate actions and monitoring response performances of a performance of a respective candidate action, where the objective function is a function of the monitored response performances following decisions and actions taken, the control apparatus comprising:
-
a) means for monitoring response performance of a respective candidate action that is chosen to be performed by the system, where the response performance is continuously monitored against response performance of a random control group;
b) means for storing, according to the candidate action performed by the system, a representation of said monitored response performance;
c) means for calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) means for choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system; and
e) means for commanding the system to perform the chosen next action, wherein the control apparatus controls the system so as to substantially optimize the objective function of the system, wherein monitoring observations captured and recorded as part of the random control group are used in the estimation of growth of regret for the purposes of choosing the next candidate action for optimized decisions, and therefore the compromise to system performance caused by running the random control group is minimized by making full use of all available observations.
-
-
154. A method of controlling a system with two or more subsystems to optimize an objective function of the system, the system performing a plurality of candidate actions, wherein a candidate action is represented by the selection of a lower level subsystem from said two or more subsystems, and wherein the system monitors the response performance of the selected subsystem, where the objective function is a function of the monitored response performances following decisions and actions taken, the method comprising the steps of:
-
a) monitoring response performance of a respective candidate action that is chosen to be performed by the system where the response performance is continuously monitored against response performance of a random control group;
b) storing, according to the candidate action performed by the system, a representation of said monitored subsystem performance in response to the candidate action;
c) calculating the expected growth in regret associated with each of the plurality of candidate actions, assessed using a probability distribution based on the historical response performances to date of said plurality of candidate actions, where the expected growth in regret is a system performance measure that is calculated to represent the trade-off between the relative merit of exploration of one or more apparently non-best candidate actions to mitigate the risk of ignoring one of said one or more apparently non-best candidate actions which may actually be the current best candidate action, with respect to the relative merit of exploiting what appears to be the current best candidate action but which in fact may not be the current best candidate action, based on said historical response performances to date;
d) choosing as the next action the candidate action that is calculated to result in the lowest expected growth in regret after the chosen candidate action is performed by the system;
e) commanding the system to perform the chosen next action using a corresponding lower level subsystem; and
f) repeating steps a) to e) to control the system so as to substantially optimize the objective function of the system, wherein monitoring observations captured and recorded as part of the random control group are used in the estimation of growth of regret for the purposes of choosing the next candidate action for optimized decisions, and therefore the compromise to system performance caused by running the random control group is minimized by making full use of all available observations.
-
Specification