Bayesian neural networks for optimization and control
First Claim
Patent Images
1. A method for determining the optimum operation of a system, comprising the steps of:
 receiving the outputs of the system and the measurable inputs to the system; and
optimizing select ones of the outputs as a function of the inputs by minimizing an objective function J to provide optimal values for select ones of the inputs;
wherein the step of optimizing includes the step of predicting the select ones of the outputs with a plurality of models of the system, each model operable to map the inputs through a representation of the system to provide predicted outputs corresponding to the select ones of the outputs which predicted outputs of each of the plurality of models are combined in accordance with a predetermined combination algorithm to provide a single output corresponding to each of the select ones of the outputs.
10 Assignments
Litigations
0 Petitions
Accused Products
Abstract
An optimization system is provided utilizing a Bayesian neural network calculation of a derivative wherein an output is optimized with respect to an input utilizing a stochastical method that averages over many regression models. This is done such that constraints from first principal models are incorporated in terms of prior art distributions.
192 Citations
80 Claims

1. A method for determining the optimum operation of a system, comprising the steps of:

receiving the outputs of the system and the measurable inputs to the system; and
optimizing select ones of the outputs as a function of the inputs by minimizing an objective function J to provide optimal values for select ones of the inputs;
wherein the step of optimizing includes the step of predicting the select ones of the outputs with a plurality of models of the system, each model operable to map the inputs through a representation of the system to provide predicted outputs corresponding to the select ones of the outputs which predicted outputs of each of the plurality of models are combined in accordance with a predetermined combination algorithm to provide a single output corresponding to each of the select ones of the outputs.  View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
determining the average predicted output of the plurality of models <
y(t)>
;
determining the average derivative of the average predicted output <
y(t)>
with regards to the inputs x(t) as ∂
<
y(t)>
/∂
x(t);
the objective function J being a function of <
y(t)> and
determining a derivative of the objective function J with respect to <
y(t)>
as ∂
J/∂
<
y(t)>
;
determining with the chain rule the relationship ∂
J/∂
x(t); and
determining the minimum of the J. 

8. The method of claim 7, wherein the average derivative of the average predicted output is weighted over the plurality of models.

9. A method for optimizing the parameters of a system having a vector input x(t) and a vector output y(t), comprising the steps of:

storing a representation of the system in a plurality of models, each model operable to map the inputs through a representation of the system to provide a predicted output, each of the models operable to predict the output of the system for a given input value of x(t), providing predetermined optimization objectives; and
determining a single optimized input vector value {circumflex over (x)}(t) by applying a predetermined optimization algorithm to the plurality of models to achieve a minimum error to the predetermined optimization objective.  View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
where $\underset{}{\overset{}{\prod i=1n\ue89e\text{\hspace{1em}\ue89eP\ue8a0(y\u2192(i)\ue85cx\u2192(i),\omega \u2192)}}}$ is the likelihood, P(ω
) is a prior distribution of the parameters ω
of the model, and their product is the posterior distribution. 

13. The method of claim 9, wherein the step of storing a representation of the system in a plurality of models comprises storing a representation of the system in a plurality of nonlinear or linear networks, each operable to map the:
 input x(t) to a predicted output through a stored representation of the system.

14. The method of claim 13, wherein the stored representation of the system in each of the plurality of nonlinear or linear networks are related in such a manner wherein the parameters of each of the linear or nonlinear networks are stochastically related to each other.

15. The method of claim 14, wherein the stochastic relationship is a Bayesian relationship.

16. The method of claim 9, wherein the predetermined optimization algorithm is an iterative optimization algorithm.

17. The method of claim 9, wherein the step of determining the single optimized input vector value {circumflex over (x)}(t) comprises determnining the derivative of the predetermined optimization objective relative to the input vector x(t) as ∂
 J/∂
x(t), where J represents the predetermined optimization objective.
 J/∂

18. The method of claim 9, wherein the step of determining comprises determining the derivative ∂
 y(t)/∂
x(t) of each of the models and then determining an average of the derivatives ∂
y(t)/∂
x(t).
 y(t)/∂

19. The method of claim 18, wherein the step of determnining the average derivative is defined over a (q, p) matrix by the following relationship:

$\frac{\partial \u3008{\stackrel{\to q}{y}}_{}}{}$ where $\underset{}{\overset{}{\prod i=1n\ue89e\text{\hspace{1em}\ue89eP\ue8a0(y\u2192(i)\ue85cx\u2192(i),\omega )}}}$ is the likelihood, P(ω
) is a prior distribution of the parameters ω
of the model, and their product is the posterior distribution.


20. The method of claim 19, wherein the step of determining ∂
 J/∂
<
x(t)>
comprises the steps of;determining the weighted average of the predicted output of each of the models by the following relationship;
$\u3008\stackrel{\to \ue8a0\left(t\right)\u3009\propto \underset{}{\overset{}{\sum w=1{N}_{w}\ue89e{F}^{\left(w\right)}\ue8a0(\stackrel{\to )}{x}\ue89e\underset{}{\overset{}{\prod i=1n\ue89e\text{\hspace{1em}\ue89eP\ue8a0(y(i)\ue85cx(i),\omega )\ue89eP\ue8a0(\omega )}}}}}}{y}$ where P(y^{(i)}x^{(i)}, ω
) P(ω
) represents the posterior probability of the model indexed by w, and N_{w }represents the maximum number of models in the stochastic relationship, and wherein the stored representation of the system in each of the plurality of models are related in such a manner wherein the parameters of each of the models are stochastically related to each other;determining the derivatives ∂
J/∂
<
y(t)>
as the variation of the predetermined optimization objective with respect to the predicted output y(t); and
determining by the chain rule the following;
$\frac{\partial J\partial {\stackrel{\to p}{x}}_{\ue89e{\ue85cp}_{=}}}{}$
 J/∂

21. A method for determining the dynamic operation of a system, comprising the steps of:

receiving the outputs of the plant system and the measurable inputs to the system; and
optimizing select ones of the outputs as a function of the inputs over a future horizon by minimizing an objective function J to achieve a predetermined desired setpoint to provide optimal values for select ones of the inputs over a trajectory to the desired setpoint in incremental time intervals;
wherein the step of optimizing includes the step of predicting as predicted outputs the select ones of the outputs over the trajectory at each of the incremental time intervals from the current value to the setpoint with a plurality of models of the system, each model operable to map the inputs through a representation of the system to provide predicted outputs corresponding to the select ones of the outputs, which predicted outputs of the plurality of models are averaged.  View Dependent Claims (22, 23, 24, 25, 26, 27)
determining the average predicted output of the plurality of models <
y(t)>
;
determining the average derivative of the average predicted output <
y(t)>
with regards to the inputs x(t) as ∂
<
y(t)>
/∂
x(t);
the objective function J being a function of <
y(t)> and
determining a derivative of the objective function J with respect to <
y(t)>
as ∂
J/∂
<
y(t)>
;
determining with the chain rule the relationship ∂
J/∂
x(t); and
determining the minimum of the objective function J. 

27. The method of claim 26, wherein the average derivative of the average predicted output is weighted over the plurality of models.

28. A method for optimizing the parameters of a system having a vector input x(t) and a vector output y(t) and with respect to the dynamic operation thereof from a current operating point to a desired setpoint for the output y(t), comprising the steps of:

storing a representation of the system in a plurality of models, each of the models operable to predict the output of the system for a given input value of x(t);
providing predetermined optimization objectives; and
determining a single optimized input vector value {circumflex over (x)}(t) for each of a plurality of time increments between the current value and the desired setpoint over a future horizon by applying a predetermined optimization algorithm to the plurality of models to achieve a minimum error to the predetermined optimization objective at each of the plurality of time increments between the current value and the desired setpoint over the future horizon, each model operable to map the inputs through a representation of the system to provide a predicted output.  View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)


32. The method of claim 30, wherein the step of storing a representation of the system in a plurality of models comprises storing a representation of the system in a plurality of nonlinear or linear networks, each operable to map the input x(t) to a predicted output through a stored representation of the system.

33. The method of claim 32, wherein the stored representation of the system in each of the plurality of nonlinear or linear networks are related in such a manner wherein the parameters of each of the nonlinear or linear networks are stochastically related to each other.

34. The method of claim 33, wherein the stochastic relationship is a Bayesian relationship.

35. The method of claim 29, and further comprising the step of applying the optimized input values of the select ones of the inputs {circumflex over (x)}(t) for less than the number of incremental time intervals from the current value to the setpoint to the corresponding inputs of the system after determination thereof.

36. The method of claim 28, wherein the predetermined optimization algorithm is an iterative optimization algorithm.

37. The method of claim 28, wherein the step of determining the single optimized input vector value {circumflex over (x)}(t) comprises determining the derivative of the predetermined optimization objective relative to the input vector x(t) as ∂
 J/∂
x(t), where J represents the predetermined optimization objective between the current value and the desired setpoint.
 J/∂

38. The method of claim 28, wherein the step of determining comprises determining the derivative ∂
 y(t)/∂
x(t) of each of the models and then determining an average of the derivatives ∂
y(t)/∂
x(t).
 y(t)/∂

39. The method of claim 38, wherein the step of determining the average derivative is defined over a (q, p) matrix by the following relationship:

$\frac{\partial \u3008{\stackrel{\to q}{y}}_{}}{}$ where $\underset{}{\overset{}{\prod i=1n\ue89e\text{\hspace{1em}\ue89eP\ue8a0(y\u2192(i)\ue85cx\u2192(i),\omega \u2192)}}}$ is the likelihood, P(ω
) is a prior distribution of the parameters ω
of the model, and their product is the posterior distribution.


40. The method of claim 39, wherein the step of determining ∂
 J/∂
<
x(t)>
comprises the steps of;determining the weighted average of the predicted outputs of each of the models at each of the increments of time by the following relationship;
$\u3008\stackrel{\to \ue8a0\left(t\right)\u3009\propto \underset{}{\overset{}{\sum w=1{N}_{w}\ue89e{F}^{\left(w\right)}\ue8a0(\stackrel{\to )}{x}\ue89e\underset{}{\overset{}{\prod i=1n\ue89e\text{\hspace{1em}\ue89eP\ue8a0(y\u2192(i)\ue85cx\u2192(i),\omega \u2192)\ue89eP\ue8a0(\omega \u2192)}}}}}}{y}$ where P(y^{(i)}x^{(i)},{right arrow over (ω
)})P({right arrow over (ω
)}) represents the posterior probability of the model indexed by w, and N_{w }represents the maximum number of models in the stochastic relationship, and wherein the stored representation of the system in each of the plurality of nonlinear or linear networks are related in such a manner wherein the parameters of each of the nonlinear or linear networks are stochastically related to each other;determining the derivatives ∂
J/∂
<
y(t)>
as the variation of the predetermined optimization objective with respect to the output y(t) at each of the plurality of time increments between the current value and the setpoint; and
determining by the chain rule the following;
$\frac{\partial J\partial {\stackrel{\to p}{x}}_{\ue89e{\ue85cp}_{=}}}{}$
 J/∂

41. An optimizing system for determining the optimum operation of a system, comprising:

an input for receiving the outputs of the system and the measurable inputs to the system; and
an optimizer for optimizing select ones of the outputs as a function of the inputs by minimizing an objective function J to provide optimal values for select ones of the inputs;
said optimizer including a plurality of models of the system, each model operable to map the inputs through a representation of the system to provide predicted outputs corresponding to the select ones of the outputs, each of the models for predicting the select ones of the outputs of the system, which predicted outputs of said plurality of models are combined in accordance with a predetermined combination algorithm to provide a single predicted output corresponding to each of the select ones of the outputs.  View Dependent Claims (42, 43, 44, 45, 46, 47, 48)
means for determining the average predicted output of the plurality of models <
y(t)>
;
means for determining the average derivative of the average predicted output <
y(t)>
with regards to the inputs x(t) as ∂
<
y(t)>
/∂
x(t);
the objective function J being a function of <
y(t)> and
means for determining a derivative of the objective function J with respect to <
y(t)>
as ∂
J/∂
<
y(t)>
;
means for determining with the chain rule the relationship ∂
J/∂
x(t); and
means for determining the minimum of the J. 

47. The optimizing system of claim 46, wherein the average derivative of the average predicted output is weighted over said plurality of models.

48. The optimizing system of claim 41, wherein said plurality of models of the pin system are operable to predict the predicted outputs corresponding to the select ones of the outputs to a point forward in time as a trajectory.

49. An optimizing system for optimizing the parameters of a system having a vector input x(t) and a vector output y(t), comprising:

a plurality of models of the system, each for storing a representation of the system, each of said models operable to predict the output as a predicted vector output of the system for a given input value of x(t), each model operable to map the inputs x(t) through a representation of the system to provide a predicted output vector corresponding to the vector output y(t); and
an optimizer for determining a single optimized input vector value {circumflex over (x)}(t) by applying a predetermined optimization algorithm to the plurality of models to achieve a minimum error to a predetermined optimization objective for the predicted output vectors for each of the models.  View Dependent Claims (50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60)
where $\underset{}{\overset{}{\prod i=1n\ue89e\text{\hspace{1em}\ue89eP\ue8a0(y\u2192(i)\ue85cx\u2192(i),\omega \u2192)}}}$ is the likelihood, P(ω
) is a prior distribution of the parameters ω
of said associated one of said models, and their product is the posterior distribution. 

53. The optimizing system of claim 49, wherein each of said plurality of models comprises a nonlinear or linear network, each operable to map the input x(t) to a predicted output through a stored representation of the system.

54. The optimizing system of claim 53, wherein the stored representation of the system in each of said plurality of nonlinear or linear networks are related in such a manner wherein the parameters of each of said linear or nonlinear networks are stochastically related to each other.

55. The optimizing system of claim 54, wherein the stochastic relationship is a Bayesian relationship.

56. The optimizing system of claim 49, wherein the predetermined optimization algorithm is an iterative optimization algorithm.

57. The optimizing system of claim 49, wherein said optimizer is operable to determine the derivative of the predetermined optimization objective relative to the input vector x(t) as ∂
 J/∂
x(t), where J represents the predetermined optimization objective.
 J/∂

58. The optimizing system of claim 49, wherein said optimizer is operable to determine the derivative ∂
 y(t)/∂
x(t) of each of said models and then determine an average of the derivatives ∂
y(t)/∂
x(t).
 y(t)/∂

59. The optimizing system of claim 58, wherein said optimizer determines the average derivative over a (q, p) matrix by the following relationship:

$\frac{\partial \u3008{\stackrel{\to q}{y}}_{}}{}$ where $\underset{}{\overset{}{\prod i=1n\ue89e\text{\hspace{1em}\ue89eP\ue8a0(y(i)\ue85cx(i),\omega )}}}$ is the likelihood, P(ω
) is a prior distribution of the parameters ω
of each of said models, and their product is the posterior distribution.


60. The optimizing system of claim 59, wherein the said optimizer determines ∂
 J/∂
<
x(t)>
with;means for determining the weighted average of the predicted output of each of said models by the following relationship;
$\u3008\stackrel{\to \ue8a0\left(t\right)\u3009\propto \underset{}{\overset{}{\sum w=1{N}_{w}\ue89e{F}^{\left(w\right)}\ue8a0(\stackrel{\to )}{x}\ue89e\underset{}{\overset{}{\prod i=1n\ue89e\text{\hspace{1em}\ue89eP\ue8a0(y(i)\ue85cx(i),\omega )\ue89eP\ue8a0(\omega )}}}}}}{y}$ where P(y^{(i)}x^{(i)},ω
)P(ω
) represents the posterior probability of said each model indexed by w, and N_{w }represents the maximum number of said models in the stochastic relationship, and wherein said stored representation of the system in each of said plurality of models are related in such a manner wherein the parameters of each of said models are stochastically related to each other;means for determining the derivatives ∂
J/∂
<
y(t)>
as the variation of the predetermined optimization objective with respect to the output y(t); and
means for determining by the chain rule the following;
$\frac{\partial J\partial {\stackrel{\to p}{x}}_{\ue89e{\ue85cp}_{=}}}{}$
 J/∂

61. An optimizing system for determining the dynamic operation of a system, comprising the steps of:

an input for receiving the outputs of the system and the measurable inputs to the system; and
an optimizer for optimizing select ones of the outputs as a function of the inputs over a future horizon by minimizing an objective function J to achieve a predetermined desired setpoint to provide optimal values for select ones of the inputs over a trajectory to the desired setpoint in incremental time intervals;
said optimizer operable to predicting the select ones of the outputs over the trajectory at each of the incremental time intervals from the current value to the setpoint with a plurality of models of the system, which predicted outputs of said plurality of models are combined in accordance with a predetermined combination algorithm to provide a single predicted output corresponding to each of the select ones of the outputs.  View Dependent Claims (62, 63, 64, 65, 66, 67)
means for determining the average predicted output of said plurality of models <
y(t)>
;
means for determining the average derivative of the average predicted output <
y(t)>
with regards to the inputs x(t) as ∂
<
y(t)>
/∂
x(t);
the objective function J being a function of <
y(t)> and
means for determining a derivative of the objective function J with respect to <
y(t)>
as ∂
J/∂
<
y(t)>
;
means for determining with the chain rule the relationship ∂
J/∂
x(t); and
means for determining the minimum of the objective function J. 

67. The optimizing system of claim 66, wherein the average derivative of the average predicted output is weighted over said plurality of models.

68. An optimizing system for optimizing the parameters of a system having a vector input x(t) and a vector output y(t) and with respect to the dynamic operation thereof from a current operating point to a desired setpoint for the output y(t), comprising:

a plurality of models, each for storing a representation of the system, each of said models operable to predict the output of the system for a given input value of x(t), each model operable to map the vector input x(t) through a representation of the system to provide a predicted output vector corresponding to the vector output y(t); and
an optimizer for determining a single optimized input vector value {circumflex over (x)}(t) for each of a plurality of time increments between the current value and the desired setpoint over a future horizon by applying a predetermined optimization algorithm to the plurality of models to achieve a minimum error to a predetermined optimization objective at each of the plurality of time increments between the current value and the desired setpoint over the future horizon.  View Dependent Claims (69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80)
where $\underset{}{\overset{}{\prod i=1n\ue89e\text{\hspace{1em}\ue89eP\ue8a0(y(i)\ue85cx(i),\omega \u2192)}}}$ is the likelihood, P(ω
) is a prior distribution of the parameters ω
of each of said models, and their product is the posterior distribution. 

72. The optimizing system of claim 69, and further comprising a control system for applying the optimized input values of the select ones of the inputs {circumflex over (x)}(t) for less than the number of incremental time intervals from the current value to the setpoint to the corresponding inputs of the system after determination thereof.

73. The optimizing system of claim 68, wherein each of said plurality of models stores a representation of the system in a plurality of nonlinear or linear networks, each operable to map the input x(t) to a predicted output through a stored representation of the system.

74. The optimizing system of claim 73, wherein the stored representation of the system in each of said plurality of nonlinear or linear networks are related in such a manner wherein the parameters of each of the nonlinear or linear networks are stochastically related to each other.

75. The optimizing system of claim 74, wherein the stochastic relationship is a Bayesian relationship.

76. The optimizing system of claim 68, wherein the predetermined optimization algorithm is an iterative optimization algorithm.

77. The optimizing system of claim 68, wherein said optimizer is operable to determine the single optimized input vector value _{{circumflex over (x)}(t) }by determining the derivative of the predetermined optimization objective relative to the input vector x(t) as ∂
 J/∂
x(t), where J represents the predetermined optimization objective between the current value and the desired setpoint.
 J/∂

78. The optimizing system of claim 68, wherein said optimizer determines the derivative ∂
 y(t)/∂
x(t) of each of said models and then determines an average of the derivatives ∂
y(t)/∂
x(t).
 y(t)/∂

79. The optimizing system of claim 78, wherein said optimizer determines the average derivative over a (q, p) matrix by the following relationship:

$\frac{\partial \u3008{\stackrel{\to q}{y}}_{}}{}$ where $\underset{}{\overset{}{\prod i=1n\ue89e\text{\hspace{1em}\ue89eP\ue8a0(y\u2192(i)\ue85cx\u2192(i),\omega \u2192)}}}$ is the likelihood, P(ω
) is a prior distribution of the parameters ω
of each of said model, and their product is the posterior distribution.


80. The optimizing system of claim 79, wherein said optimizer is operable to determine ∂
 J/∂
<
∂
(t)>
with;means for determining the weighted average of the predicted outputs of each of said models at each of the increments of time by the following relationship;
$\u3008\stackrel{\to \ue8a0\left(t\right)\u3009\propto \underset{}{\overset{}{\sum w=1{N}_{w}\ue89e{F}^{\left(w\right)}\ue8a0(\stackrel{\to )}{x}\ue89e\underset{}{\overset{}{\prod i=1n\ue89e\text{\hspace{1em}\ue89eP\ue8a0(y\u2192(i)\ue85cx\u2192(i),\omega \u2192)\ue89eP\ue8a0(\omega \u2192)}}}}}}{y}$ where P(y^{(i)}x^{(1)},{right arrow over (ω
)})P({right arrow over (ω
)}) represents the posterior probability of said associated one of said models indexed by w, and N_{w }represents the maximum number of said models in the stochastic relationship, and wherein the stored representation of the system in each of said models is related in such a manner wherein the parameters of each of said models are stochastically related to each other;means for determining the derivatives ∂
J/∂
<
y(t)>
as the variation of the predetermined optimization objective with respect to the predicted output y(t) at each of the plurality of time increments between the current value and the setpoint; and
means for determining by the chain rule the following;
$\frac{\partial J\partial {\stackrel{\to p}{x}}_{\ue89e{\ue85cp}_{=}}}{}$
 J/∂
1 Specification