ARTIFICIAL NEURAL NETWORKS HAVING COMPETITIVE REWARD MODULATED SPIKE TIME DEPENDENT PLASTICITY AND METHODS OF TRAINING THE SAME

US 20200133273A1
Filed: 10/23/2019
Published: 04/30/2020
Est. Priority Date: 10/29/2018
Status: Active Grant

First Claim

Patent Images

1. A method of training an artificial neural network having a plurality of layers and at least one weight matrix encoding connection weights between neurons in successive layers of the plurality of layers, the method comprising:

receiving, at an input layer of the plurality of layers, at least one input;

generating, at an output layer of the plurality of layers, at least one output based on the at least one input;

generating a reward based on a comparison between the at least one output and a desired output; and

modifying the connection weights based on the reward, wherein the modifying the connection weights comprises maintaining a sum of synaptic input weights to each neuron to be substantially constant and maintaining a sum of synaptic output weights from each neuron to be substantially constant.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of training an artificial neural network having a series of layers and at least one weight matrix encoding connection weights between neurons in successive layers. The method includes receiving, at an input layer of the series of layers, at least one input, generating, at an output layer of the series of layers, at least one output based on the at least one input, generating a reward based on a comparison of between the at least one output and a desired output, and modifying the connection weights based on the reward. Modifying the connection weights includes maintaining a sum of synaptic input weights to each neuron to be substantially constant and maintaining a sum of synaptic output weights from each neuron to be substantially constant.

Citations

20 Claims

1. A method of training an artificial neural network having a plurality of layers and at least one weight matrix encoding connection weights between neurons in successive layers of the plurality of layers, the method comprising:
- receiving, at an input layer of the plurality of layers, at least one input;
  
  generating, at an output layer of the plurality of layers, at least one output based on the at least one input;
  
  generating a reward based on a comparison between the at least one output and a desired output; and
  
  modifying the connection weights based on the reward, wherein the modifying the connection weights comprises maintaining a sum of synaptic input weights to each neuron to be substantially constant and maintaining a sum of synaptic output weights from each neuron to be substantially constant.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising adjusting the synaptic input and output weights of each neuron according to Equation 1, wherein Equation 1 is:
  - 3. The method of claim 2, wherein the adjusting of the synaptic input and output weights is performed at a regular interval.
  - 4. The method of claim 3, wherein the regular interval is about 50 ms or less.
  - 5. The method of claim 1, further comprising averaging the reward over a run time of the artificial neural network to be about zero.
  - 6. The method of claim 5, wherein the averaging the reward comprises calculating a running average score of the reward according to Equation 2, wherein Equation 2 is X_n=(X_n−
    - 1*(1−
      
      α
      
      ))+S_n*α
      
      , wherein α
      
      is a rate of adaptation and S_nis a score for a given iteration.
  - 7. The method of claim 6, wherein the generating the reward comprises calculating the reward according to Equation 3, wherein Equation 3 is R_n=S_n−
    - X_n.
  - 8. The method of claim 1, wherein the artificial neural network is to control an autonomous vehicle.

9. A system, comprising:
- a processor; and
  
  a non-transitory computer-readable storage medium operably coupled to the processor, the non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by the processor, cause the processor to;
  
  process input parameters with an artificial neural network stored in the processor;
  
  generate, from the artificial neural network, at least one output based on the input parameters;
  
  generate a reward based on a comparison of between the output and a desired output; and
  
  modify connection weights between neurons in the artificial neural network based on the reward, wherein the modifying the connection weights comprises maintaining a sum of synaptic input weights to each neuron to be substantially constant and maintaining a sum of synaptic output weights from each neuron to be substantially constant.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The system of claim 9, further comprising a plurality of sensors, the plurality of sensors configured to generate the input parameters.
  - 11. The system of claim 10, further comprising at least one vehicle component, wherein the processor is configured to control the at least one vehicle component based on the at least one output of the artificial neural network.
  - 12. The system of claim 9, wherein the software instructions, when executed by the processor, further cause the processor to average the reward over a time period to be substantially zero.
  - 13. The system of claim 9, wherein the artificial neural network comprises an input layer and an output layer, and wherein each neuron of the input layer are directly connected to each neuron of the output layer.
  - 14. The system of claim 9, wherein the artificial neural network comprises an input layer, at least one hidden layer, and an output layer.
  - 15. The system of claim 9, wherein the artificial neural network is to control an autonomous vehicle.

16. A method for controlling a vehicle component of a vehicle having a plurality of sensors and a processor in communication with the plurality of sensors, the method comprising:
- receiving input parameters from the plurality of sensors;
  
  processing the input parameters with an artificial neural network stored in the processor;
  
  controlling the vehicle component based on output parameters calculated by the artificial neural network;
  
  determining a reward based on a comparison between a desired behavior of the vehicle and a behavior of the vehicle resulting from the controlling of the vehicle component; and
  
  modifying connection weights between neurons in the artificial neural network based on the reward, wherein the modifying the connection weights comprises maintaining a sum of synaptic input weights to each neuron to be substantially constant and maintaining a sum of synaptic output weights from each neuron to be substantially constant.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The method of claim 16, wherein the vehicle is an autonomous vehicle.
  - 18. The method of claim 17, wherein the autonomous vehicle is selected from the group consisting of an autonomous automobile and an autonomous aerial vehicle.
  - 19. The method of claim 16, wherein the reward is calculated when the controlling the vehicle causes the vehicle to get closer to a target.
  - 20. The method of claim 16, wherein a value of the reward is calculated in proportion to a decrease in distance between the vehicle and a target after the controlling of the vehicle.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
HRL Laboratories LLC (The Boeing Co.)
Original Assignee
HRL Laboratories LLC (The Boeing Co.)
Inventors
Skorheim, Steven W., Stepp, Nigel D., Scorcioni, Ruggero

Granted Patent

US 11,347,221 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G05D 1/0088   characterized by the autono...

G05D 1/0221   involving a learning process

G05D 1/101   specially adapted for aircraft

G06N 3/04   Architecture, e.g. intercon...

G06N 3/049   Temporal neural networks, e...

G06N 3/08   Learning methods

G06N 3/088   Non-supervised learning, e....

ARTIFICIAL NEURAL NETWORKS HAVING COMPETITIVE REWARD MODULATED SPIKE TIME DEPENDENT PLASTICITY AND METHODS OF TRAINING THE SAME

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

ARTIFICIAL NEURAL NETWORKS HAVING COMPETITIVE REWARD MODULATED SPIKE TIME DEPENDENT PLASTICITY AND METHODS OF TRAINING THE SAME

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links