AUTONOMOUS SYSTEM INCLUDING A CONTINUALLY LEARNING WORLD MODEL AND RELATED METHODS

US 20200134426A1
Filed: 08/22/2019
Published: 04/30/2020
Est. Priority Date: 10/24/2018
Status: Abandoned Application

First Claim

Patent Images

1. An autonomous or semi-autonomous system comprising:

a temporal prediction network configured to process a first set of samples from an environment of the system during performance of a first task;

a controller configured to process the first set of samples from the environment and a hidden state output by the temporal prediction network;

a preserved copy of the temporal prediction network; and

a preserved copy of the controller,wherein the preserved copy of the temporal prediction network and the preserved copy of the controller are configured to generate simulated rollouts, andwherein the system is configured to interleave the simulated rollouts with a second set of samples from the environment during performance of a second task to preserve knowledge of the temporal prediction network for performing the first task.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An autonomous or semi-autonomous system includes a temporal prediction network configured to process a first set of samples from an environment of the system during performance of a first task, a controller configured to process the first set of samples from the environment and a hidden state output by the temporal prediction network, a preserved copy of the temporal prediction network, and a preserved copy of the controller. The preserved copy of the temporal prediction network and the preserved copy of the controller are configured to generate simulated rollouts, and the system is configured to interleave the simulated rollouts with a second set of samples from the environment during performance of a second task to preserve knowledge of the temporal prediction network for performing the first task.

Citations

21 Claims

1. An autonomous or semi-autonomous system comprising:
- a temporal prediction network configured to process a first set of samples from an environment of the system during performance of a first task;
  
  a controller configured to process the first set of samples from the environment and a hidden state output by the temporal prediction network;
  
  a preserved copy of the temporal prediction network; and
  
  a preserved copy of the controller,wherein the preserved copy of the temporal prediction network and the preserved copy of the controller are configured to generate simulated rollouts, andwherein the system is configured to interleave the simulated rollouts with a second set of samples from the environment during performance of a second task to preserve knowledge of the temporal prediction network for performing the first task.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system of claim 1, further comprising an auto-encoder, wherein the auto-encoder is configured to embed the first set of samples from the environment of the system into a latent space.
  - 3. The system of claim 2, wherein the auto-encoder is a convolutional variational auto-encoder.
  - 4. The system of claim 1, wherein the controller is a stochastic gradient-descent based reinforcement learning controller.
  - 5. The system of claim 4, wherein the controller comprises an A2C algorithm.
  - 6. The system of claim 1, wherein the temporal prediction network comprises:
    - a Long Short-Term Memory (LSTM) layer; and
      
      a Mixture Density Network.
  - 7. The system of claim 1, wherein the controller is configured to output an action distribution, and wherein sampled actions from the action distribution maximize an expected reward on the first task.

8. A non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to:
- train a temporal prediction network on a first set of samples from an environment of an autonomous or semi-autonomous system during performance of a first task;
  
  train a controller on the first set of samples from the environment and a hidden state output by the temporal prediction network;
  
  store a preserved copy of the temporal prediction network;
  
  store a preserved copy of the controller,generate simulated rollouts from the preserved copy of the temporal prediction network and the preserved copy of the controller; and
  
  interleave the simulated rollouts with a second set of samples from the environment during performance of a second task to preserve knowledge of the temporal prediction network for performing the first task.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
- - 9. The non-transitory computer-readable storage medium of claim 8, wherein the software instructions, when executed by the processor, further cause the processor to embed, with an auto-encoder, the first set of samples into a latent space.
  - 10. The non-transitory computer-readable storage medium of claim 9, wherein the auto-encoder is a convolutional variational auto-encoder.
  - 11. The non-transitory computer-readable storage medium of claim 8, wherein training the controller utilizes policy distillation including a cross-entropy loss function with a specific temperature.
  - 12. The non-transitory computer-readable storage medium of claim 11, wherein the specific temperature is 0.01.
  - 13. The non-transitory computer-readable storage medium of claim 8, wherein the controller is a stochastic gradient-descent based reinforcement learning controller.
  - 14. The non-transitory computer-readable storage medium of claim 13, wherein the controller comprises an A2C algorithm.
  - 15. The non-transitory computer-readable storage medium of claim 8, wherein the temporal prediction network comprises:
    - a Long Short-Term Memory (LSTM) layer; and
      
      a Mixture Density Network.
  - 16. The non-transitory computer-readable storage medium of claim 11, wherein the software instructions, when executed by the processor, further cause the processor to output an action distribution from the controller, and wherein sampled actions from the action distribution maximize an expected reward on the first task.

17. A method of training an autonomous or semi-autonomous system, the method comprising:
- training a temporal prediction network to perform a 1-time-step prediction on a first set of samples from an environment of the system during performance of a first task;
  
  training a controller to generate an action distribution based on the first set of samples and a hidden state of the temporal prediction network, wherein sampled actions of the action distribution maximize an expected reward on the first task;
  
  preserving the temporal prediction network and the controller as a preserved copy of the temporal prediction network and a preserved copy of the controller, respectively;
  
  generating simulated rollouts from the preserved copy of the temporal prediction network and the preserved copy of the controller; and
  
  interleaving the simulated rollouts with a second set of samples from the environment during performance of a second task to preserve knowledge of the temporal prediction network for performing the first task.
- View Dependent Claims (18, 19, 20, 21)
- - 18. The method of claim 17, wherein the training the controller utilizes policy distillation including a cross-entropy loss function with a specific temperature of 0.01.
  - 19. The method of claim 17, further comprising embedding, with a convolutional auto-encoder, the first set of samples collected during performance of the first task into a latent space.
  - 20. The method of claim 17, wherein the controller is a stochastic gradient-descent based reinforcement learning controller comprising an A2C algorithm.
  - 21. The method of claim 17, wherein the temporal prediction network comprises:
    - a Long Short-Term Memory (LSTM) layer; and
      
      a Mixture Density Network.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
HRL Laboratories LLC (The Boeing Co.)
Original Assignee
HRL Laboratories LLC (The Boeing Co.)
Inventors
Ketz, Nicholas A., Pilly, Praveen K., Kolouri, Soheil, Martin, Charles E., Howard, Michael D.

Application Number

US16/548,560
Publication Number

US 20200134426A1
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

A63F 13/67   adaptively or by learning f...

G06F 17/15   Correlation function comput...

G06N 3/006   based on simulated virtual ...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/047   Probabilistic or stochastic...

AUTONOMOUS SYSTEM INCLUDING A CONTINUALLY LEARNING WORLD MODEL AND RELATED METHODS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

AUTONOMOUS SYSTEM INCLUDING A CONTINUALLY LEARNING WORLD MODEL AND RELATED METHODS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links