AUTONOMOUS SYSTEM INCLUDING A CONTINUALLY LEARNING WORLD MODEL AND RELATED METHODS
First Claim
1. An autonomous or semi-autonomous system comprising:
- a temporal prediction network configured to process a first set of samples from an environment of the system during performance of a first task;
a controller configured to process the first set of samples from the environment and a hidden state output by the temporal prediction network;
a preserved copy of the temporal prediction network; and
a preserved copy of the controller,wherein the preserved copy of the temporal prediction network and the preserved copy of the controller are configured to generate simulated rollouts, andwherein the system is configured to interleave the simulated rollouts with a second set of samples from the environment during performance of a second task to preserve knowledge of the temporal prediction network for performing the first task.
2 Assignments
0 Petitions
Accused Products
Abstract
An autonomous or semi-autonomous system includes a temporal prediction network configured to process a first set of samples from an environment of the system during performance of a first task, a controller configured to process the first set of samples from the environment and a hidden state output by the temporal prediction network, a preserved copy of the temporal prediction network, and a preserved copy of the controller. The preserved copy of the temporal prediction network and the preserved copy of the controller are configured to generate simulated rollouts, and the system is configured to interleave the simulated rollouts with a second set of samples from the environment during performance of a second task to preserve knowledge of the temporal prediction network for performing the first task.
-
Citations
21 Claims
-
1. An autonomous or semi-autonomous system comprising:
-
a temporal prediction network configured to process a first set of samples from an environment of the system during performance of a first task; a controller configured to process the first set of samples from the environment and a hidden state output by the temporal prediction network; a preserved copy of the temporal prediction network; and a preserved copy of the controller, wherein the preserved copy of the temporal prediction network and the preserved copy of the controller are configured to generate simulated rollouts, and wherein the system is configured to interleave the simulated rollouts with a second set of samples from the environment during performance of a second task to preserve knowledge of the temporal prediction network for performing the first task. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to:
-
train a temporal prediction network on a first set of samples from an environment of an autonomous or semi-autonomous system during performance of a first task; train a controller on the first set of samples from the environment and a hidden state output by the temporal prediction network; store a preserved copy of the temporal prediction network; store a preserved copy of the controller, generate simulated rollouts from the preserved copy of the temporal prediction network and the preserved copy of the controller; and interleave the simulated rollouts with a second set of samples from the environment during performance of a second task to preserve knowledge of the temporal prediction network for performing the first task. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A method of training an autonomous or semi-autonomous system, the method comprising:
-
training a temporal prediction network to perform a 1-time-step prediction on a first set of samples from an environment of the system during performance of a first task; training a controller to generate an action distribution based on the first set of samples and a hidden state of the temporal prediction network, wherein sampled actions of the action distribution maximize an expected reward on the first task; preserving the temporal prediction network and the controller as a preserved copy of the temporal prediction network and a preserved copy of the controller, respectively; generating simulated rollouts from the preserved copy of the temporal prediction network and the preserved copy of the controller; and interleaving the simulated rollouts with a second set of samples from the environment during performance of a second task to preserve knowledge of the temporal prediction network for performing the first task. - View Dependent Claims (18, 19, 20, 21)
-
Specification