Online learning and vehicle control method based on reinforcement learning without active exploration
First Claim
1. A computer-implemented method of adaptively controlling an autonomous operation of a vehicle, the method comprising:
- a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum value for a cost-to-go of the vehicle when applied by an actor network; and
b) in an actor network in the computing system and operatively coupled to the critic network, determining a control input to apply to the vehicle which produces the minimum value for the cost-to-go,wherein the actor network is configured to determine the control input by estimating a noise level using the estimated average cost, an estimated cost-to-go determined from the approximated cost-to-go function, a control dynamics for a current state of the vehicle, and the samples of passively collected data, andwherein the approximated cost-to-go function is determined using a linear combination of weighted radial basis functions in accordance with the following relationship;
3 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method of adaptively controlling an autonomous operation of a vehicle is provided. The method includes steps of (a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum value for a cost-to-go of the vehicle when applied by an actor network; and (b) in an actor network in the computing system and operatively coupled to the critic network, determining a control input to apply to the vehicle that produces the minimum value for the cost-to-go, wherein the actor network is configured to determine the control input by estimating a noise level using the average cost, a cost-to-go determined from the approximated cost-to-go function, a control dynamics for a current state of the vehicle, and the passively collected data.
-
Citations
19 Claims
-
1. A computer-implemented method of adaptively controlling an autonomous operation of a vehicle, the method comprising:
-
a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum value for a cost-to-go of the vehicle when applied by an actor network; and b) in an actor network in the computing system and operatively coupled to the critic network, determining a control input to apply to the vehicle which produces the minimum value for the cost-to-go, wherein the actor network is configured to determine the control input by estimating a noise level using the estimated average cost, an estimated cost-to-go determined from the approximated cost-to-go function, a control dynamics for a current state of the vehicle, and the samples of passively collected data, and wherein the approximated cost-to-go function is determined using a linear combination of weighted radial basis functions in accordance with the following relationship; - View Dependent Claims (2, 3, 4, 6, 8, 13, 14, 15)
-
-
5. A computer-implemented method of adaptively controlling an autonomous operation of a vehicle, the method comprising:
-
a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum value for a cost-to-go of the vehicle when applied by an actor network; and b) in an actor network in the computing system and operatively coupled to the critic network, determining a control input to apply to the vehicle which produces the minimum value for the cost-to-go, wherein the actor network is configured to determine the control input by estimating a noise level using the estimated average cost, an estimated cost-to-go determined from the approximated cost-to-go function, a control dynamics for a current state of the vehicle, and the samples of passively collected data, the method further comprising the step of updating parameters of the critic network using an approximated temporal difference error determined using a linearized version of a bellman equation, and wherein the estimated average cost determined by the critic network is updated in accordance with the following relationship;
{circumflex over (Z)}avgi+1={circumflex over (Z)}avgi−
α
2iek{circumflex over (Z)}kwhere β
is a learning rate, ek is the approximated temporal difference error, {circumflex over (Z)}k is an estimated cost determined from the approximated cost-to-go function, {circumflex over (Z)}avgi is an estimated average cost in state i, and {circumflex over (Z)}avgi+1 is an estimated average cost in state i+1.
-
-
7. A computer-implemented method of adaptively controlling an autonomous operation of a vehicle, the method comprising:
-
a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum value for a cost-to-go of the vehicle when applied by an actor network; and b) in an actor network in the computing system and operatively coupled to the critic network, determining a control input to apply to the vehicle which produces the minimum value for the cost-to-go, wherein the actor network is configured to determine the control input by estimating a noise level using the estimated average cost, an estimated cost-to-go determined from the approximated cost-to-go function, a control dynamics for a current state of the vehicle, and the samples of passively collected data, the method further comprising the step of updating parameters of the critic network using an approximated temporal difference error determined in accordance with the following relationship;
ek;
={circumflex over (Z)}avg{circumflex over (Z)}k−
exp(−
qk){circumflex over (Z)}k+1where ek is the approximated temporal difference error, {circumflex over (Z)}avg is an estimated average cost, {circumflex over (Z)}k is an estimated cost-to-go in a state k, {circumflex over (Z)}k+1 is an estimated cost-to-go in a state k+1, and qk is a state cost in the state k.
-
-
9. A computer-implemented method of adaptively controlling an autonomous operation of a vehicle, the method comprising:
-
a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum value for a cost-to-go of the vehicle when applied by an actor network; and b) in an actor network in the computing system and operatively coupled to the critic network, determining a control input to apply to the vehicle which produces the minimum value for the cost-to-go, wherein the actor network is configured to determine the control input by estimating a noise level using the estimated average cost, an estimated cost-to-go determined from the approximated cost-to-go function, a control dynamics for a current state of the vehicle, and the samples of passively collected data, and wherein the noise level is learned using a linear combination of weighted basis functions in accordance with the relationship; - View Dependent Claims (10, 11, 12)
-
-
16. A computing system configured for adaptively controlling an autonomous operation of a vehicle, the computing system comprising one or more processors for controlling operation of the computing system, and a memory for storing data and program instructions usable by the one or more processors, wherein the one or more processors are configured to execute instructions stored in the memory to:
-
a) determine, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum value for a cost-to-go of the vehicle; and b) determine a control input to apply to the vehicle that produces the minimum value for the cost-to-go, wherein the one or more processors are configured to determine the control input by estimating a noise level using the estimated average cost, a cost-to-go determined from the approximated cost-to-go function, a control dynamics for a current state of the vehicle, and the samples of passively collected data, wherein the approximated cost-to-go function is determined using a linear combination of weighted radial basis functions in accordance with the following relationship; - View Dependent Claims (17)
-
-
18. A non-transitory computer readable medium having stored therein instructions executable by a computer system to cause the computer system to perform functions, the functions comprising:
-
a) determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum value for a cost-to-go of a vehicle; and b) determining a control input to apply to the vehicle to control an autonomous operation of the vehicle, wherein the control input produces the minimum value for the cost-to-go, and wherein the control input is determined by estimating a noise level using the average cost, a cost-to-go determined from the approximated cost-to-go function, a control dynamics for a current state of the vehicle, and the samples of passively collected data, wherein the approximated cost-to-go function is determined using a linear combination of weighted radial basis functions in accordance with the following relationship; - View Dependent Claims (19)
-
Specification