Fast temporal neural learning using teacher forcing
First Claim
1. Apparatus for training a neural network comprising input, hidden and output sets of neurons having respective neuron gains interconnected by respective synapses having respective synapse weights to produce at outputs of said output set of neurons a time-varying target vector in response to a time-varying training vector applied to inputs of said input set of neurons, said time-varying training and target vectors being defined for a predetermined time interval, said apparatus comprising:
- means for applying respective elements of said time-varying training vector to the inputs of respective ones of said input set of neurons during said predetermined time interval;
means for measuring an error vector constructed from the differences between the output values produced at the outputs of said output set of neurons and corresponding elements of said time-varying target vector during said predetermined time interval;
means for determining a function of each individual element of said error vector during said predetermined time interval;
means for feeding back said function of each individual element of said error vector to inputs of respective ones of said set of output neurons during said predetermined time interval;
means responsive to said error vector and to current values of said neuron gains and synapse weights for changing at least one of (a) said neuron gains and (b) said synapse weights in accordance with a gradient descent algorithm at the end of said predetermined time interval to decrease the magnitude of the said error vector; and
wherein,said means for applying, said means for measuring, said means for determining, said means for feeding back and said means for changing all operate together in repetitive cycles, each of said cycles having a time duration equal to said predetermined time interval; and
,said means for determining a function of each individual element of said error vector comprises means for modulating said function in accordance with measurements of said error vector by said means for measuring during a previous one of said repetitive cycles.
2 Assignments
0 Petitions
Accused Products
Abstract
A neural network is trained to output a time dependent target vector defined over a predetermined time interval in response to a time dependent input vector defined over the same time interval by applying corresponding elements of the error vector, or difference between the target vector and the actual neuron output vector, to the inputs of corresponding output neurons of the network corrective feedback. This feedback decreases the error and quickens the learning process, so that a much smaller number of training cycles are required to complete the learning process. A conventional gradient descent algorithm is employed to update the neural network parameters at the end of the predetermined time interval. The foregoing process is repeated in repetitive cycles until the actual output vector corresponds to the target vector. In the preferred embodiment, as the overall error of the neutral network output decreases during successive training cycles, the portion of the error fed back to the output neurons is decreased accordingly, allowing the network to learn with greater freedom from teacher forcing as the network parameters converge to their optimum values. The invention may also be used to train a neural network with stationary training and target vectors.
42 Citations
39 Claims
-
1. Apparatus for training a neural network comprising input, hidden and output sets of neurons having respective neuron gains interconnected by respective synapses having respective synapse weights to produce at outputs of said output set of neurons a time-varying target vector in response to a time-varying training vector applied to inputs of said input set of neurons, said time-varying training and target vectors being defined for a predetermined time interval, said apparatus comprising:
-
means for applying respective elements of said time-varying training vector to the inputs of respective ones of said input set of neurons during said predetermined time interval; means for measuring an error vector constructed from the differences between the output values produced at the outputs of said output set of neurons and corresponding elements of said time-varying target vector during said predetermined time interval; means for determining a function of each individual element of said error vector during said predetermined time interval; means for feeding back said function of each individual element of said error vector to inputs of respective ones of said set of output neurons during said predetermined time interval; means responsive to said error vector and to current values of said neuron gains and synapse weights for changing at least one of (a) said neuron gains and (b) said synapse weights in accordance with a gradient descent algorithm at the end of said predetermined time interval to decrease the magnitude of the said error vector; and
wherein,said means for applying, said means for measuring, said means for determining, said means for feeding back and said means for changing all operate together in repetitive cycles, each of said cycles having a time duration equal to said predetermined time interval; and
,said means for determining a function of each individual element of said error vector comprises means for modulating said function in accordance with measurements of said error vector by said means for measuring during a previous one of said repetitive cycles. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for training a neural network comprising input, hidden and output sets of neurons having respective neuron gains interconnected by respective synapses having respective synapse weights to produce at outputs of said output set of neurons a time-varying target vector in response to a time-varying training vector applied to inputs of said input set of neurons, said time-varying training and target vectors being defined for a predetermined time interval, said method comprising:
-
applying respective elements of said time-varying training vector to the inputs of respective ones of said input set of neurons during said predetermined time interval; measuring an error vector constructed from the differences between the output values produced at the outputs of said output set of neurons and corresponding elements of said time-varying target vector during said predetermined time interval; determining a function of each individual element of said error vector during said predetermined time interval; feeding back said function of each individual element of said error vector to inputs of respective ones of said set of output neurons during said predetermined time interval; changing at least one of (a) said neuron gains and (b) said synapse weights in response to said error vector and to current values of said neuron gains and synapse weights in accordance with a gradient descent algorithm at the end of said predetermined time interval to decrease the magnitude of the said error vector; and
wherein,said applying, said measuring, said determining, said feeding back and said changing is performed in repetitive cycles, each of said cycles having a time duration equal to said predetermined time interval; and
,said determining a function of each individual element of said error vector comprises modulating said function in accordance with measurements of said error vector by said measuring during a previous one of said repetitive cycles. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method of training a neural network to output a target vector in response to a training vector, said method comprising:
-
feeding back to neuron inputs of said neural network a function of an error vector corresponding to a difference between said target vector and a current output vector of said neural network; and
,stimulating said neural network with said training vector while feeding back said function of the error vector; modulating said function in accordance with a factor dependent upon elements of said error vector; said feeding back is performed in repetitive cycles of a cyclic period and cyclically adjusting parameters of said neural network, and wherein said adjusting is performed in accordance with a measurement of said error vector during a previous one of said repetitive cycles.
-
-
21. Apparatus for training a neural network comprising input, hidden and output sets of neurons having respective neuron gains interconnected by respective synapses having respective synapse weights to produce at outputs of said output set of neurons a target vector in response to a training vector applied to inputs of said input set of neurons, said apparatus comprising:
-
means for applying respective elements of said training vector to the inputs of respective ones of said input set of neurons; means for measuring an error vector constructed from the differences between the output values produced at the outputs of said output set of neurons and corresponding elements of said target vector; means for determining a function of each individual element of said error vector; means for feeding back said function of each individual element of said error vector to inputs of respective ones of said set of output neurons; means responsive to said error vector and to current values of said neuron gains and synapse weights for changing at least one of (a) said neuron gains and (b) said synapse weights in accordance with a gradient descent algorithm to decrease the magnitude of the said error vector; and
wherein,said means for applying, said means for measuring, said means for determining, said means for feeding back and said means for changing all operate together in repetitive cycles, each of said cycles having a time duration equal to a predetermined time interval; and
,said means for determining a function of each individual element of said error vector comprises means for modulating said function in accordance with measurements of said error vector by said means for measuring during a previous one of said repetitive cycles. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A method for training a neural network comprising input, hidden and output sets of neurons having respective neuron gains interconnected by respective synapses having respective synapse weights to produce at outputs of said output set of neurons a target vector in response to a training vector applied to inputs of said input set of neurons, said method comprising:
-
applying respective elements of said training vector to the inputs of respective ones of said input set of neurons; measuring an error vector constructed from the differences between the output values produced at the outputs of said output set of neurons and corresponding elements of said target vector; determining a function of each individual element of said error vector; feeding back said function of each individual element of said error vector to inputs of respective ones of said set of output neurons; changing at least one of (a) said neuron gains and (b) said synapse weights in response to said error vector and to current values of said neuron gains and synapse weights in accordance with a gradient descent algorithm to decrease the magnitude of the said error vector; and
wherein,said applying, said measuring, said determining, said feeding back and said changing is performed in repetitive cycles, each of said cycles having a time duration equal to a predetermined time interval; and
,said determining a function of each individual element of said error vector comprises modulating said function in accordance with measurements of said error vector by said measuring during a previous one of said repetitive cycles. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39)
-
Specification