Solving the distal reward problem through linkage of STDP and dopamine signaling

US 8,103,602 B2
Filed: 12/21/2007
Issued: 01/24/2012
Est. Priority Date: 12/29/2006
Status: Expired due to Fees

First Claim

Patent Images

1. In a computer-implemented simulated nervous system network having a first pre-neuron and a second post-neuron, and a synaptic pathway between the first pre-neuron and the second post-neuron having synaptic strength (s) and an eligibility trace (c), a software-executable method for determining a firing pattern of the first pre-neuron and the second post-neuron, comprising:

(a) firing the first pre-neuron and the second post-neuron to induce changes to the synaptic strength (s) according to a spike-timing-dependent plasticity (STDP) rule; and

(b) providing extracellular dopamine to the synaptic pathway during a window of time after the firing and the eligibility trace (c) decays to zero.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In Pavlovian and instrumental conditioning, rewards typically come seconds after reward-triggering actions, creating an explanatory conundrum known as the distal reward problem or the credit assignment problem. How does the brain know what firing patterns of what neurons are responsible for the reward if (1) the firing patterns are no longer there when the reward arrives and (2) most neurons and synapses are active during the waiting period to the reward? A model network and computer simulation of cortical spiking neurons with spike-timing-dependent plasticity (STDP) modulated by dopamine (DA) is disclosed to answer this question. STDP is triggered by nearly-coincident firing patterns of a presynaptic neuron and a postsynaptic neuron on a millisecond time scale, with slow kinetics of subsequent synaptic plasticity being sensitive to changes in the extracellular dopamine DA concentration during the critical period of a few seconds after the nearly-coincident firing patterns. Random neuronal firings during the waiting period leading to the reward do not affect STDP, and hence make the neural network insensitive to this ongoing random firing activity. The importance of precise firing patterns in brain dynamics and the use of a global diffusive reinforcement signal in the form of extracellular dopamine DA can selectively influence the right synapses at the right time.

83 Citations

View as Search Results

8 Claims

1. In a computer-implemented simulated nervous system network having a first pre-neuron and a second post-neuron, and a synaptic pathway between the first pre-neuron and the second post-neuron having synaptic strength (s) and an eligibility trace (c), a software-executable method for determining a firing pattern of the first pre-neuron and the second post-neuron, comprising:
- (a) firing the first pre-neuron and the second post-neuron to induce changes to the synaptic strength (s) according to a spike-timing-dependent plasticity (STDP) rule; and
  
  (b) providing extracellular dopamine to the synaptic pathway during a window of time after the firing and the eligibility trace (c) decays to zero.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. A method according to claim 1, wherein the window is in the range of a few seconds.
  - 3. A method according to claim 1, further comprising delivering to the synaptic pathway an increase of dopamine each time a post-synaptic firing of the post-neuron occurs within a certain time after a pre-synaptic firing of the pre-neuron to increase the synaptic strength (s).
  - 4. A method according to claim 3, wherein the increase in dopamine is provided at a random delay of between 1-3 seconds after the firing of the pre-neuron and the post-neuron and each time the post-neuron firing occurs within about 10 ms after the pre-neuron firing to reinforce the firing of the first pre-neuron and the second post-neuron.
  - 5. A method according to claim 4, wherein the increase in dopamine is provided until the synaptic pathway is potentiated up to a maximum allowable value.
  - 6. A method according to claim 4, wherein the maximum allowable value is 4 mV.

7. In a computer-implemented simulated nervous system network having a first pre-neuron and a second post-neuron, and a synaptic pathway between the first pre-neuron and the second post-neuron having synaptic strength (s) and an eligibility trace (c), a software-executable method for implementing reinforcement learning in a spiking network based on spike timing dependent plasticity (STDP), comprising:
- (a) firing the first pre-neuron and the second post-neuron within a substantially coincident time of one another to induce changes to the synaptic strength (s) according to a spike-timing-dependent-plasticity (STDP) rule;
  
  (b) detecting an eligibility trace (c) over a time window commencing with the substantially coincident firings of the first pre-neuron and the second post-neuron, the eligibility trace (c) decaying towards zero over the time window; and
  
  (c) providing an extracellular global diffusive reinforcement signal to the synaptic pathway during the time window and providing an increase in the reinforcement signal at a time in the window occurring at about 1-3 seconds after the coincident firing and commencement of the time window.

8. In a computer-implemented simulated nervous system network having four random groups of neurons, representing, respectively, an unconditional stimulus (US), a first conditional stimulus (CS₁), a second conditional stimulus (CS₂), and cortical projections (VTA_p) that project to a ventral tegmental area (VTA) of a brain responsible for releasing dopamine, and in which there are synaptic connections from the unconditional stimulus (US) groups of neurons to the cortical projections (VTA_p) group of neurons, and from the first conditional stimulus (CS₁) group of neurons, and the second conditional stimulus (CS₂) group of neurons, a software-executable method of shifting the release of dopamine in response to the unconditional stimulus (US) to an earlier reward-predicting conditional stimulus (CS₁) and (CS₂), comprising:
- (a) initially setting the synaptic connections from the unconditional stimulus (US) groups of neurons to maximum values;
  
  (b) firing the neurons of the unconditional stimulus (US) groups of neurons to induce changes to the synaptic strength (S) according to a spike-timing-dependent plasticity (STDP) rule, and to cause a response in the neurons of the cortical projections (VTA_p);
  
  (c) firing the neurons of the first conditional stimulus (CS₁) prior to firing the unconditional stimulus (US) by about 1±
  
  0.25 seconds to induce changes to the synaptic strength (S) in accordance with the STDP rule and to shift to and evoke a response by the cortical projections (VTA_p) to the first conditional stimulus (CS₁) of neurons; and
  
  (d) firing the neurons of the second conditional stimulus (CS₂) prior to firing the neurons of the first conditional stimulus (CS₁) by about 1±
  
  0.25 seconds to induce changes to the synaptic strength (S) in accordance with the STDP rule and to shift to and evoke a response by the cortical projections (VTA_p) to the second conditional stimulus (CS₂) of neurons.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Neurosciences Research Foundation, Inc.
Original Assignee
Neurosciences Research Foundation, Inc.
Inventors
Izhikevich, Eugene M.
Primary Examiner(s)
Gaffin, Jeffrey A
Assistant Examiner(s)
Gonzales, Vincent

Application Number

US11/963,403
Publication Number

US 20080162391A1
Time in Patent Office

1,495 Days
Field of Search

706/25, 706/15
US Class Current

706/15
CPC Class Codes

G06N 20/00   Machine learning

G06N 3/006   based on simulated virtual ...

G06N 3/02   Neural networks

G06N 3/049   Temporal neural networks, e...

G06N 3/063   using electronic means

G06N 3/065   Analogue means

G06N 3/088   Non-supervised learning, e....

Solving the distal reward problem through linkage of STDP and dopamine signaling

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

83 Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Solving the distal reward problem through linkage of STDP and dopamine signaling

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

83 Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links