Accelerated training apparatus for back propagation networks

US 5,228,113 A
Filed: 06/17/1991
Issued: 07/13/1993
Est. Priority Date: 06/17/1991
Status: Expired due to Fees

First Claim

Patent Images

1. Apparatus for training a feed forward neural network having at least two layers of nodes, with a first, input layer having n1 nodes and a second, hidden layer having n2 nodes, each node i of said hidden layer having a weight vector W2_i, where i=1, . . . ,n2, said apparatus comprising:

(a) means for applying to the input layer successive ones of a plurality p of input vectors, for each of which the respective, desired output of the network is known, said input vectors forming an input matrix
space="preserve" listing-type="equation">X=X.sub.i,j, where i=1, . . . , p and j=1, . . . , n1;

(b) means for determining a set of r orthogonal singular vectors from said input matrix X such that the standard deviations of the projections of said input vectors along these singular vectors, as a set, are substantially maximized, said singular vectors each being denoted by a unit vector V₁, . . . , V_n1, where
space="preserve" listing-type="equation">V.sub.1.sup.2 +V.sub.2.sup.2 + . . . +V.sub.n1.sup.2 =1, and having an associated singular value which is a real number greater than or equal to zero, thereby to provide an optimal view of the input data; and

(c) means for changing the weight vector W2_i of each hidden layer node to minimize the error of the actual network output with respect to the desired output, while requiring during the training process that each hidden layer weight vector only be allowed to change in a direction parallel to one of the singular vectors of X.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A supervised procedure for obtaining weight values for back-propagation neural networks is described. The method according to the invention performs a sequence of partial optimizations in order to determine values for the network connection weights. The partial optimization depends on a constrained representation of hidden weights derived from a singular value decomposition of the input space as well as an Iterative Least Squares optimization solution for the output weights.

35 Citations

View as Search Results

10 Claims

1. Apparatus for training a feed forward neural network having at least two layers of nodes, with a first, input layer having n1 nodes and a second, hidden layer having n2 nodes, each node i of said hidden layer having a weight vector W2_i, where i=1, . . . ,n2, said apparatus comprising:
- (a) means for applying to the input layer successive ones of a plurality p of input vectors, for each of which the respective, desired output of the network is known, said input vectors forming an input matrix
  space="preserve" listing-type="equation">X=X.sub.i,j,
  where i=1, . . . , p and j=1, . . . , n1;
  (b) means for determining a set of r orthogonal singular vectors from said input matrix X such that the standard deviations of the projections of said input vectors along these singular vectors, as a set, are substantially maximized, said singular vectors each being denoted by a unit vector V₁, . . . , V_n1, where
  space="preserve" listing-type="equation">V.sub.1.sup.2 +V.sub.2.sup.2 + . . . +V.sub.n1.sup.2 =1,
  and having an associated singular value which is a real number greater than or equal to zero, thereby to provide an optimal view of the input data; and
  
  (c) means for changing the weight vector W2_i of each hidden layer node to minimize the error of the actual network output with respect to the desired output, while requiring during the training process that each hidden layer weight vector only be allowed to change in a direction parallel to one of the singular vectors of X.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. Apparatus of claim 1, wherein said neural network has at least three layers of nodes, with a third output layer having n3 nodes, each node of said third output layer having an output weight vector W3_i, where i=1, . . . ,n3, said apparatus further comprising means for determining the output weight vectors including:
    - (d) means for independently optimizing the output weight vectors, there being n₃ independent optimizations, each of which determines the output weight vector incident on each output node according to the Incremental Least Squares (ILS) procedure.
  - 3. Apparatus of claim 2, further comprising means for producing outputs at each of said first layer nodes which are a sigmoid function of the respective inputs.
  - 4. Apparatus of claim 2, further comprising means for producing outputs at each of said second layer nodes which are a sigmoid function of the respective inputs.
  - 5. Apparatus of claim 1, further comprising means for producing outputs at each of said first layer nodes which are a sigmoid function of the respective inputs.
  - 6. Apparatus of claim 1, further comprising means for producing outputs at each of said second layer nodes which are a sigmoid function of the respective inputs.

7. Apparatus for training a neural network composed of nodes having differentiable one-to-one nonlinear transfer functions such that, a plurality p of input vectors may be identified for each of which the respective, desired output vector of the network is known, said input vectors being represented as an input matrix

space="preserve" listing-type="equation">X=X.sub.i,j,
where i=1, . . . ,p, j=1, . . . ,n, n being the dimensionality of the input vectors, and said output vectors being represented as an output matrix

space="preserve" listing-type="equation">Y=Y.sub.i,j,where i=1, . . . ,p, j=1, . . . ,m, m being the dimensionality of the output vectors;

all nodes in the network to which input vectors are presented being identified as input nodes denoted as

space="preserve" listing-type="equation">I.sub.1, . . . ,I.sub.nwhere n is the dimensionality of the input vectors;

all nodes in the network from which output vectors are to be extracted being identified as output nodes denoted as

space="preserve" listing-type="equation">ω

.sub.1, . . . ,ω

.sub.mwhere m is the dimensionality of the output vectors; and

the remaining nodes in the network being identified as hidden nodes denoted as

space="preserve" listing-type="equation">ε

.sub.1, . . . ,ε

.sub.t-(n+m)where t is the total number of nodes comprising the neural network;

said apparatus comprising;

(a) means for associating with each hidden node ε

_i a weight vector u_i representing the strength of all synaptic connections leading to said hidden node ε

_i, where i=1, . . . ,t-(n+m), and associating with every output node ω

_i, a weight vector v_i representing the strengths of all synaptic connections leading to said output node ω

_i, where i=1, . . . ,m;

each hidden node ε

_i having identified therewith a set of optimal direction vectors denoted as d_i,j where i=1, . . . ,t-(n+m), j=1, . . . ,r_i, r_i being the dimensionality of the weight vector u_i associated with said hidden node ε

_i and moreover being the number of nodes from which said hidden node ε

_i receives inputs as well as being equal to the dimensionality of said direction vectors d_i,j, the concept of optimality of said vector d_i,j being defined in terms of an orthogonal direction along which the standard deviation of the projections of the inputs are essentially maximized, and said vectors d_i,j, being obtained as singular vectors of the input space for the hidden node ε

_i ;

(b) means for imposing a constraint on each weight vector u_i which requires said weight vector to be aligned with a particular direction vector d_i,j(i), and sized by a variable scalar multiplier c_i, said constraint being expressed by the equation

space="preserve" listing-type="equation">u.sub.i =c.sub.i d.sub.i,j(i),where i=1, . . . ,t-(n+m) and the index j(i) is selected by processes which operate by choosing a direction vector d_i,j(i) along which changes in the weight vector u_i tend to most quickly decrease the deviations between the actual output vectors of the network measured at the output nodes ω

_k where k=1, . . . ,m, and the desired output vectors as represented by said output matrix Y, said deviation being measured by processes exemplified by but not limited to the root means square measure of error, said root means square error being defined by the equation ##EQU4## where a_i,j is the result of the propagation of input vector i applied to all input nodes simultaneously and the result propagated throughout the network to each output node ω

_j, where i=1, . . . ,p, j=1, . . . ,m;

(c) means for performing the Iterative Least Squares solution for the weight vector v_i identified with each output node ω

_i, where i=1, . . . ,m;

(d) means for performing a numerical optimization of the scalar multipliers c_i which determine the weights identified with each hidden node ε

₁, where i=1, . . . ,t-(n+m), said optimization being performed in such a manner as to adjust the totality of all said multipliers c_i so as to reduce deviation between the output values generated by propagating all inputs through the network to the final output nodes denoted ω

_j, j=1, . . . ,m and the desired output values Y_k,j, k=1, . . . ,p, j=1, . . . ,m;

(e) means for evaluating the selection of the index j(i) associated with the direction vector d_i,j(i) at each hidden node ε

_i, where i=1, . . . ,t-(n+m), so that said index may be replaced by a choice consistent with the conditions set forth in step (b) as effected by evolution of the network through the training process;

(f) means for reconstructing the entire set of direction vectors d_i,j associated with hidden node ε

_i ;

(g) means for performing a repetition of steps (a), . . . ,(f) in such a manner as to effectively minimize deviations between the actual output vectors of the network and the desired output vectors, said deviations being dependent upon a specific implementation, but exemplified by the root mean squares measure of error.
View Dependent Claims (8, 9, 10)

8. Apparatus defined in claim 7 as applied to a layered neural network, the nodes of which are divided into some number K of separate classes, said node classes defining layers of said network, there being connections only between nodes in distinct layers;
and wherein the totality of connections between any two layers L_i and L_j are completely characterized by a matrix

space="preserve" listing-type="equation">H.sub.(i,j) =H.sub.α

,β

(i,j),where 1<

=i<

j<

=K, α

=1, . . . ,n_j,β

=1, . . . ,n_i and n_i, n_j are the respective numbers of nodes comprising layer i and layer j.

9. Apparatus defined in claim 7 as comprising a feed-forward neural network, said feed-forward network being characterized by the capability to propagate an input through the network in only the forward direction so that inputs to each node are dependent on only those nodes seen to precede said node in the order of propagation of data through the network, the graphical realization of said feed-forward network being a directed graph with directed edges or arcs in place of the data flow connections of the network, and with the direction of said arcs being that of forward propagation of data through said data flow connections of the neural network, and further, with said directed graph being free of loops or cycles of any kind.

10. Apparatus defined in claim 7 comprising a 3-layer feed-forward neural network, every hidden node ε
_i of said 3-layer feed-forward network receiving inputs exclusively from input nodes I_j, where i=1, . . . ,t-(n+m), j=1, . . . ,n, said input nodes having values obtained directly from said input matrix X, the input space for said hidden node ε

_i being completely spanned, generated and defined by the vectors commonly referenced as the row vector of said input matrix X, thereby rendering said input space, as well as all singular vectors and singular values thereof, invariant and constant with respect to all evolution arising from training;

wherein the weights on all connections between the input nodes and hidden nodes are identified as the matrix

space="preserve" listing-type="equation">U=u.sub.i,j where i=1, . . . ,t-(n+m), j=1, . . . ,n;

the weights on all connections leading to output nodes are identified as the matrix

space="preserve" listing-type="equation">W=w.sub.i,j where i=1, . . . ,m, j=1, . . . ,r, the value r being sufficient to support such connections as are required for the implementation, in particular, if direct connections from input to output are to be realized, r=t-m; and

the inputs to all output nodes are identified as the matrix

space="preserve" listing-type="equation">Z=Z.sub.i,j where i=1, . . . ,p, j=1, . . . ,r;

said apparatus further comprising;

(h) means for obtaining for each hidden node ε

_i the optimal set of directions d_i,j by extracting the singular vectors from the input space of the node, said singular vectors being substantially equivalent to the singular vectors of the input matrix X; and

(i) means for using the Iterative Least Squares (ILS) method to obtain an optimal set of output weights.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The United States of America As Represented By The Secretary of Agriculture
Original Assignee
United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration
Inventors
Shelton, Robert O.
Primary Examiner(s)
MacDonald, Allen R.

Application Number

US07/716,182
Time in Patent Office

757 Days
Field of Search

395/23, 395/21
US Class Current

706/25
CPC Class Codes

G06N 3/084 Backpropagation, e.g. using...

Accelerated training apparatus for back propagation networks

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

35 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Accelerated training apparatus for back propagation networks

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

35 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links