Solving the distal reward problem through linkage of STDP and dopamine signaling
First Claim
1. In a computer-implemented simulated nervous system network having a first pre-neuron and a second post-neuron, and a synaptic pathway between the first pre-neuron and the second post-neuron having synaptic strength (s) and an eligibility trace (c), a software-executable method for determining a firing pattern of the first pre-neuron and the second post-neuron, comprising:
- (a) firing the first pre-neuron and the second post-neuron to induce changes to the synaptic strength (s) according to a spike-timing-dependent plasticity (STDP) rule; and
(b) providing extracellular dopamine to the synaptic pathway during a window of time after the firing and the eligibility trace (c) decays to zero.
1 Assignment
0 Petitions
Accused Products
Abstract
In Pavlovian and instrumental conditioning, rewards typically come seconds after reward-triggering actions, creating an explanatory conundrum known as the distal reward problem or the credit assignment problem. How does the brain know what firing patterns of what neurons are responsible for the reward if (1) the firing patterns are no longer there when the reward arrives and (2) most neurons and synapses are active during the waiting period to the reward? A model network and computer simulation of cortical spiking neurons with spike-timing-dependent plasticity (STDP) modulated by dopamine (DA) is disclosed to answer this question. STDP is triggered by nearly-coincident firing patterns of a presynaptic neuron and a postsynaptic neuron on a millisecond time scale, with slow kinetics of subsequent synaptic plasticity being sensitive to changes in the extracellular dopamine DA concentration during the critical period of a few seconds after the nearly-coincident firing patterns. Random neuronal firings during the waiting period leading to the reward do not affect STDP, and hence make the neural network insensitive to this ongoing random firing activity. The importance of precise firing patterns in brain dynamics and the use of a global diffusive reinforcement signal in the form of extracellular dopamine DA can selectively influence the right synapses at the right time.
83 Citations
8 Claims
-
1. In a computer-implemented simulated nervous system network having a first pre-neuron and a second post-neuron, and a synaptic pathway between the first pre-neuron and the second post-neuron having synaptic strength (s) and an eligibility trace (c), a software-executable method for determining a firing pattern of the first pre-neuron and the second post-neuron, comprising:
-
(a) firing the first pre-neuron and the second post-neuron to induce changes to the synaptic strength (s) according to a spike-timing-dependent plasticity (STDP) rule; and (b) providing extracellular dopamine to the synaptic pathway during a window of time after the firing and the eligibility trace (c) decays to zero. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. In a computer-implemented simulated nervous system network having a first pre-neuron and a second post-neuron, and a synaptic pathway between the first pre-neuron and the second post-neuron having synaptic strength (s) and an eligibility trace (c), a software-executable method for implementing reinforcement learning in a spiking network based on spike timing dependent plasticity (STDP), comprising:
-
(a) firing the first pre-neuron and the second post-neuron within a substantially coincident time of one another to induce changes to the synaptic strength (s) according to a spike-timing-dependent-plasticity (STDP) rule; (b) detecting an eligibility trace (c) over a time window commencing with the substantially coincident firings of the first pre-neuron and the second post-neuron, the eligibility trace (c) decaying towards zero over the time window; and (c) providing an extracellular global diffusive reinforcement signal to the synaptic pathway during the time window and providing an increase in the reinforcement signal at a time in the window occurring at about 1-3 seconds after the coincident firing and commencement of the time window.
-
-
8. In a computer-implemented simulated nervous system network having four random groups of neurons, representing, respectively, an unconditional stimulus (US), a first conditional stimulus (CS1), a second conditional stimulus (CS2), and cortical projections (VTAp) that project to a ventral tegmental area (VTA) of a brain responsible for releasing dopamine, and in which there are synaptic connections from the unconditional stimulus (US) groups of neurons to the cortical projections (VTAp) group of neurons, and from the first conditional stimulus (CS1) group of neurons, and the second conditional stimulus (CS2) group of neurons, a software-executable method of shifting the release of dopamine in response to the unconditional stimulus (US) to an earlier reward-predicting conditional stimulus (CS1) and (CS2), comprising:
-
(a) initially setting the synaptic connections from the unconditional stimulus (US) groups of neurons to maximum values; (b) firing the neurons of the unconditional stimulus (US) groups of neurons to induce changes to the synaptic strength (S) according to a spike-timing-dependent plasticity (STDP) rule, and to cause a response in the neurons of the cortical projections (VTAp); (c) firing the neurons of the first conditional stimulus (CS1) prior to firing the unconditional stimulus (US) by about 1±
0.25 seconds to induce changes to the synaptic strength (S) in accordance with the STDP rule and to shift to and evoke a response by the cortical projections (VTAp) to the first conditional stimulus (CS1) of neurons; and(d) firing the neurons of the second conditional stimulus (CS2) prior to firing the neurons of the first conditional stimulus (CS1) by about 1±
0.25 seconds to induce changes to the synaptic strength (S) in accordance with the STDP rule and to shift to and evoke a response by the cortical projections (VTAp) to the second conditional stimulus (CS2) of neurons.
-
Specification