Fast temporal neural learning using teacher forcing

US 5,428,710 A
Filed: 06/29/1992
Issued: 06/27/1995
Est. Priority Date: 06/29/1992
Status: Expired due to Fees

First Claim

Patent Images

1. Apparatus for training a neural network comprising input, hidden and output sets of neurons having respective neuron gains interconnected by respective synapses having respective synapse weights to produce at outputs of said output set of neurons a time-varying target vector in response to a time-varying training vector applied to inputs of said input set of neurons, said time-varying training and target vectors being defined for a predetermined time interval, said apparatus comprising:

means for applying respective elements of said time-varying training vector to the inputs of respective ones of said input set of neurons during said predetermined time interval;

means for measuring an error vector constructed from the differences between the output values produced at the outputs of said output set of neurons and corresponding elements of said time-varying target vector during said predetermined time interval;

means for determining a function of each individual element of said error vector during said predetermined time interval;

means for feeding back said function of each individual element of said error vector to inputs of respective ones of said set of output neurons during said predetermined time interval;

means responsive to said error vector and to current values of said neuron gains and synapse weights for changing at least one of (a) said neuron gains and (b) said synapse weights in accordance with a gradient descent algorithm at the end of said predetermined time interval to decrease the magnitude of the said error vector; and

wherein,said means for applying, said means for measuring, said means for determining, said means for feeding back and said means for changing all operate together in repetitive cycles, each of said cycles having a time duration equal to said predetermined time interval; and

,said means for determining a function of each individual element of said error vector comprises means for modulating said function in accordance with measurements of said error vector by said means for measuring during a previous one of said repetitive cycles.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A neural network is trained to output a time dependent target vector defined over a predetermined time interval in response to a time dependent input vector defined over the same time interval by applying corresponding elements of the error vector, or difference between the target vector and the actual neuron output vector, to the inputs of corresponding output neurons of the network corrective feedback. This feedback decreases the error and quickens the learning process, so that a much smaller number of training cycles are required to complete the learning process. A conventional gradient descent algorithm is employed to update the neural network parameters at the end of the predetermined time interval. The foregoing process is repeated in repetitive cycles until the actual output vector corresponds to the target vector. In the preferred embodiment, as the overall error of the neutral network output decreases during successive training cycles, the portion of the error fed back to the output neurons is decreased accordingly, allowing the network to learn with greater freedom from teacher forcing as the network parameters converge to their optimum values. The invention may also be used to train a neural network with stationary training and target vectors.

42 Citations

View as Search Results

39 Claims

1. Apparatus for training a neural network comprising input, hidden and output sets of neurons having respective neuron gains interconnected by respective synapses having respective synapse weights to produce at outputs of said output set of neurons a time-varying target vector in response to a time-varying training vector applied to inputs of said input set of neurons, said time-varying training and target vectors being defined for a predetermined time interval, said apparatus comprising:
- means for applying respective elements of said time-varying training vector to the inputs of respective ones of said input set of neurons during said predetermined time interval;
  
  means for measuring an error vector constructed from the differences between the output values produced at the outputs of said output set of neurons and corresponding elements of said time-varying target vector during said predetermined time interval;
  
  means for determining a function of each individual element of said error vector during said predetermined time interval;
  
  means for feeding back said function of each individual element of said error vector to inputs of respective ones of said set of output neurons during said predetermined time interval;
  
  means responsive to said error vector and to current values of said neuron gains and synapse weights for changing at least one of (a) said neuron gains and (b) said synapse weights in accordance with a gradient descent algorithm at the end of said predetermined time interval to decrease the magnitude of the said error vector; and
  
  wherein,said means for applying, said means for measuring, said means for determining, said means for feeding back and said means for changing all operate together in repetitive cycles, each of said cycles having a time duration equal to said predetermined time interval; and
  
  ,said means for determining a function of each individual element of said error vector comprises means for modulating said function in accordance with measurements of said error vector by said means for measuring during a previous one of said repetitive cycles.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The apparatus of claim 1 wherein said means for modulating said function comprises means for multiplying said function by a factor that depends upon the measurements of all elements of said error vector measured by said means for measuring during an immediately preceding one of said repetitive cycles.
  - 3. The apparatus of claim 2 wherein said factor is 1- exp, wherein E(τ
    - ) is an integral over said predetermined time interval of a sum of squares of all elements of the error vector measured during said immediately preceding one of said repetitive cycles.
  - 4. The apparatus of claim 1 wherein said means for determining a function of each element of said error vector comprises means for scaling said function.
  - 5. The apparatus of claim 4 wherein said means for scaling comprise means fox raising each element of said error vector to an exponential power of β
    - , raising the corresponding element of said time-varying target vector to an exponential power of 1-β
      
      , and multiplying them together, wherein β
      
      is a rational number less than one.
  - 6. The apparatus of claim 5 wherein β
    - is on the order of approximately 7/9.
  - 7. The apparatus of claim 1 wherein the outputs of each of said neurons obey a set of differential equations during said predetermined time interval, said differential equation being a function of said neuron gains, said synapse weights and said time-varying training and target vectors, and wherein said means for computing said changes by performing a gradient descent algorithm comprise:
    - means for deriving a set of sensitivity equations from said set of differential equations;
      
      means for solving said set of sensitivity equations once for each one of a set of parameters of said neural network, said parameters comprising at least one of (a) said neuron gains and (b) said synapse weights;
      
      means for computing a differential of the output value of each neuron with respect to corresponding ones of said parameters; and
      
      means for computing a change to be made to each one of said parameters at the end of said predetermined time interval by integrating over said predetermined time interval the product of said differential and a corresponding element of said error vector.
  - 8. The apparatus of claim 1 further comprising:
    - means for progressively decreasing said function in successive ones of said repetitive cycles.
  - 9. The apparatus of claim 8 wherein said means for progressively decreasing said function decreases said function as a function of a decrease in a functional of said error vector over said successive ones of said repetitive cycles.
  - 10. The apparatus of claim 9 wherein said functional comprises an integral of a function of each element of said error vector measured during a previous one of said repetitive cycles.

11. A method for training a neural network comprising input, hidden and output sets of neurons having respective neuron gains interconnected by respective synapses having respective synapse weights to produce at outputs of said output set of neurons a time-varying target vector in response to a time-varying training vector applied to inputs of said input set of neurons, said time-varying training and target vectors being defined for a predetermined time interval, said method comprising:
- applying respective elements of said time-varying training vector to the inputs of respective ones of said input set of neurons during said predetermined time interval;
  
  measuring an error vector constructed from the differences between the output values produced at the outputs of said output set of neurons and corresponding elements of said time-varying target vector during said predetermined time interval;
  
  determining a function of each individual element of said error vector during said predetermined time interval;
  
  feeding back said function of each individual element of said error vector to inputs of respective ones of said set of output neurons during said predetermined time interval;
  
  changing at least one of (a) said neuron gains and (b) said synapse weights in response to said error vector and to current values of said neuron gains and synapse weights in accordance with a gradient descent algorithm at the end of said predetermined time interval to decrease the magnitude of the said error vector; and
  
  wherein,said applying, said measuring, said determining, said feeding back and said changing is performed in repetitive cycles, each of said cycles having a time duration equal to said predetermined time interval; and
  
  ,said determining a function of each individual element of said error vector comprises modulating said function in accordance with measurements of said error vector by said measuring during a previous one of said repetitive cycles.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The method of claim 11 wherein said modulating said function comprises multiplying said function by a factor that depends upon the measurements of all elements of said error vector measured during an immediately preceding one of said repetitive cycles.
  - 13. The method of claim 12 wherein said factor is 1-exp wherein E(τ
    - ) is an integral over said predetermined time interval of a sum of squares of all elements of the error vector measured during said immediately preceding one of said repetitive cycles.
  - 14. The method of claim 11 wherein said determining a function of each element of said error vector comprises scaling said function.
  - 15. The method of claim 14 wherein said scaling comprise raising each element of said error vector to an exponential power of β
    - , raising the corresponding element of said time-varying target vector to an exponential power of 1-β
      
      , and multiplying them together, wherein β
      
      is a rational number less than one.
  - 16. The method of claim 15 wherein β
    - is on the order of approximately 7/9.
  - 17. The method of claim 11 wherein the outputs of each of said neurons obey a set of differential equations during said predetermined time interval, said differential equations depending upon said neuron gains, said synapse weights and said time-varying training and target vectors, and wherein said computing said changes comprises:
    - deriving a set of sensitivity equations from said set of differential equations;
      
      solving said set of sensitivity equations once for each one of a set of parameters of said neural network, said parameters comprising at least, one of (a) said neuron gains and (b) said synapse weights;
      
      computing a differential of the output value of each neuron with respect to corresponding ones of said parameters; and
      
      computing a change to be made to each one of said parameters at the end of said predetermined time interval by integrating over said predetermined time interval the product of said differential and a corresponding element of said error vector.
  - 18. The method of claim 11 further comprising:
    - progressively decreasing said function in successive ones of said repetitive cycles.
  - 19. The method of claim 18 wherein said progressively decreasing said function comprises decreasing said function in accordance with a decrease in a functional of said error vector over said successive ones of said repetitive cycles.

20. A method of training a neural network to output a target vector in response to a training vector, said method comprising:
- feeding back to neuron inputs of said neural network a function of an error vector corresponding to a difference between said target vector and a current output vector of said neural network; and
  
  ,stimulating said neural network with said training vector while feeding back said function of the error vector;
  
  modulating said function in accordance with a factor dependent upon elements of said error vector;
  
  said feeding back is performed in repetitive cycles of a cyclic period and cyclically adjusting parameters of said neural network, and wherein said adjusting is performed in accordance with a measurement of said error vector during a previous one of said repetitive cycles.

21. Apparatus for training a neural network comprising input, hidden and output sets of neurons having respective neuron gains interconnected by respective synapses having respective synapse weights to produce at outputs of said output set of neurons a target vector in response to a training vector applied to inputs of said input set of neurons, said apparatus comprising:
- means for applying respective elements of said training vector to the inputs of respective ones of said input set of neurons;
  
  means for measuring an error vector constructed from the differences between the output values produced at the outputs of said output set of neurons and corresponding elements of said target vector;
  
  means for determining a function of each individual element of said error vector;
  
  means for feeding back said function of each individual element of said error vector to inputs of respective ones of said set of output neurons;
  
  means responsive to said error vector and to current values of said neuron gains and synapse weights for changing at least one of (a) said neuron gains and (b) said synapse weights in accordance with a gradient descent algorithm to decrease the magnitude of the said error vector; and
  
  wherein,said means for applying, said means for measuring, said means for determining, said means for feeding back and said means for changing all operate together in repetitive cycles, each of said cycles having a time duration equal to a predetermined time interval; and
  
  ,said means for determining a function of each individual element of said error vector comprises means for modulating said function in accordance with measurements of said error vector by said means for measuring during a previous one of said repetitive cycles.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 22. The apparatus of claim 21 wherein said means for modulating said function comprises means for multiplying said function by a factor that depends upon the measurements of all elements of said error vector measured by said means for measuring during an immediately preceding one of said repetitive cycles.
  - 23. The apparatus of claim 22 wherein said factor is 1-exp, wherein E(τ
    - ) is an integral over said predetermined time interval of a sum of squares of all elements of the error vector measured during said immediately preceding one of said repetitive cycles.
  - 24. The apparatus of claim 21 wherein said means for determining a function of each element of said error vector comprises means for scaling said function.
  - 25. The apparatus of claim 24 wherein said means for scaling comprise means for raising each element of said error vector to an exponential power of β
    - , raising the corresponding element of said target vector to an exponential power of 1-β
      
      , and multiplying them together, wherein β
      
      is a rational number less than one.
  - 26. The apparatus of claim 25 wherein β
    - is on the order of approximately 7/9.
  - 27. The apparatus of claim 21, wherein the outputs of each of said neurons obey a set of differential equations during said predetermined time interval, said differential equation being a function of said neuron gains, said synapse weights and said training and target vectors, and wherein said means for computing said changes by performing a gradient descent, algorithm comprise:
    - means for deriving a set of sensitivity equations from said set of differential equations;
      
      means for solving said set of sensitivity equations once for each one of a set of parameters of said neural network, said parameters comprising at least, one of (a) said neuron gains and (b) said synapse weights;
      
      means for computing a differential of the output value of each neuron with respect to corresponding ones of said parameters; and
      
      means for computing a change to be made to each one of said parameters at the end of said predetermined time interval by integrating over said predetermined time interval the product of said differential and a corresponding element of said error vector.
  - 28. The apparatus of claim 21 further comprising:
    - means for progressively decreasing said function in successive ones of said repetitive cycles.
  - 29. The apparatus of claim 28 wherein said means for progressively decreasing said function decreases said function as a function of a decrease in a functional of said error vector over said successive ones of said repetitive cycles.
  - 30. The apparatus of claim 29 wherein said functional comprises an integral of a function of each element of said error vector measured during a previous one of said repetitive cycles.

31. A method for training a neural network comprising input, hidden and output sets of neurons having respective neuron gains interconnected by respective synapses having respective synapse weights to produce at outputs of said output set of neurons a target vector in response to a training vector applied to inputs of said input set of neurons, said method comprising:
- applying respective elements of said training vector to the inputs of respective ones of said input set of neurons;
  
  measuring an error vector constructed from the differences between the output values produced at the outputs of said output set of neurons and corresponding elements of said target vector;
  
  determining a function of each individual element of said error vector;
  
  feeding back said function of each individual element of said error vector to inputs of respective ones of said set of output neurons;
  
  changing at least one of (a) said neuron gains and (b) said synapse weights in response to said error vector and to current values of said neuron gains and synapse weights in accordance with a gradient descent algorithm to decrease the magnitude of the said error vector; and
  
  wherein,said applying, said measuring, said determining, said feeding back and said changing is performed in repetitive cycles, each of said cycles having a time duration equal to a predetermined time interval; and
  
  ,said determining a function of each individual element of said error vector comprises modulating said function in accordance with measurements of said error vector by said measuring during a previous one of said repetitive cycles.
- View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39)
- - 32. The method of claim 31 wherein said modulating said function comprises multiplying said function by a factor that depends upon the measurements of all elements of said error vector measured during an immediately preceding one of said repetitive cycles.
  - 33. The method of claim 32 wherein said factor is 1-exp, wherein E(τ
    - ) is an integral over said predetermined time interval of a sum of squares of all elements of the error vector measured during said immediately preceding one of said repetitive cycles.
  - 34. The method of claim 31 wherein said determining a function of each element of said error vector comprises scaling said function.
  - 35. The method of claim 34 wherein said scaling comprise raising each element of said error vector to an exponential prover of β
    - , raising the corresponding element of said target vector to an exponential power of 1-β
      
      , and multiplying them together, wherein β
      
      is a rational number less than one.
  - 36. The method of claim 35 wherein β
    - is on the order of approximately 7/9.
  - 37. The method of claim 31 wherein the outputs of each of said neurons obey a set of differential equations during said predetermined time interval, said differential equations depending upon said neuron gains, said synapse weights and said training and target vectors, and wherein said computing said changes comprises:
    - deriving a set, of sensitivity equations from said set, of differential equations;
      
      solving said set of sensitivity equations once for each one of a set of parameters of said neural network, said parameters comprising at least one of (a) said neuron gains and (b) said synapse weights;
      
      computing a differential of the output value of each neuron with respect to corresponding ones-of said parameters; and
      
      computing a change to be made to each one of said parameters at the end of said predetermined time interval by integrating over said predetermined time interval the product of said differential and a corresponding element of said error vector.
  - 38. The method of claim 31 further comprising:
    - progressively decreasing said function in successive ones of said repetitive cycles.
  - 39. The method of claim 38 wherein said progressively decreasing said function comprises decreasing said function in accordance with a decrease in a functional of said error vector over said successive ones of said repetitive cycles.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The United States of America As Represented By The Secretary of Agriculture
Original Assignee
United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration
Inventors
Barhen, Jacob, Toomarian, Nikzad
Primary Examiner(s)
Downs, Robert W.
Assistant Examiner(s)
Hafiz, Tariq R.

Application Number

US07/908,677
Time in Patent Office

1,093 Days
Field of Search

395/21, 395/22, 395/23, 395/11, 395/27, 382/15
US Class Current

706/25
CPC Class Codes

G06N 3/08 Learning methods

Fast temporal neural learning using teacher forcing

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

42 Citations

39 Claims

Specification

Solutions

Use Cases

Quick Links

Fast temporal neural learning using teacher forcing

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

39 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links