Accelerated training apparatus for back propagation networks
First Claim
1. Apparatus for training a feed forward neural network having at least two layers of nodes, with a first, input layer having n1 nodes and a second, hidden layer having n2 nodes, each node i of said hidden layer having a weight vector W2i, where i=1, . . . ,n2, said apparatus comprising:
- (a) means for applying to the input layer successive ones of a plurality p of input vectors, for each of which the respective, desired output of the network is known, said input vectors forming an input matrix
space="preserve" listing-type="equation">X=X.sub.i,j, where i=1, . . . , p and j=1, . . . , n1;
(b) means for determining a set of r orthogonal singular vectors from said input matrix X such that the standard deviations of the projections of said input vectors along these singular vectors, as a set, are substantially maximized, said singular vectors each being denoted by a unit vector V1, . . . , Vn1, where
space="preserve" listing-type="equation">V.sub.1.sup.2 +V.sub.2.sup.2 + . . . +V.sub.n1.sup.2 =1, and having an associated singular value which is a real number greater than or equal to zero, thereby to provide an optimal view of the input data; and
(c) means for changing the weight vector W2i of each hidden layer node to minimize the error of the actual network output with respect to the desired output, while requiring during the training process that each hidden layer weight vector only be allowed to change in a direction parallel to one of the singular vectors of X.
1 Assignment
0 Petitions
Accused Products
Abstract
A supervised procedure for obtaining weight values for back-propagation neural networks is described. The method according to the invention performs a sequence of partial optimizations in order to determine values for the network connection weights. The partial optimization depends on a constrained representation of hidden weights derived from a singular value decomposition of the input space as well as an Iterative Least Squares optimization solution for the output weights.
35 Citations
10 Claims
-
1. Apparatus for training a feed forward neural network having at least two layers of nodes, with a first, input layer having n1 nodes and a second, hidden layer having n2 nodes, each node i of said hidden layer having a weight vector W2i, where i=1, . . . ,n2, said apparatus comprising:
-
(a) means for applying to the input layer successive ones of a plurality p of input vectors, for each of which the respective, desired output of the network is known, said input vectors forming an input matrix
space="preserve" listing-type="equation">X=X.sub.i,j,where i=1, . . . , p and j=1, . . . , n1; (b) means for determining a set of r orthogonal singular vectors from said input matrix X such that the standard deviations of the projections of said input vectors along these singular vectors, as a set, are substantially maximized, said singular vectors each being denoted by a unit vector V1, . . . , Vn1, where
space="preserve" listing-type="equation">V.sub.1.sup.2 +V.sub.2.sup.2 + . . . +V.sub.n1.sup.2 =1,and having an associated singular value which is a real number greater than or equal to zero, thereby to provide an optimal view of the input data; and (c) means for changing the weight vector W2i of each hidden layer node to minimize the error of the actual network output with respect to the desired output, while requiring during the training process that each hidden layer weight vector only be allowed to change in a direction parallel to one of the singular vectors of X. - View Dependent Claims (2, 3, 4, 5, 6)
-
- 7. Apparatus for training a neural network composed of nodes having differentiable one-to-one nonlinear transfer functions such that, a plurality p of input vectors may be identified for each of which the respective, desired output vector of the network is known, said input vectors being represented as an input matrix
- space="preserve" listing-type="equation">X=X.sub.i,j,
where i=1, . . . ,p, j=1, . . . ,n, n being the dimensionality of the input vectors, and said output vectors being represented as an output matrix
space="preserve" listing-type="equation">Y=Y.sub.i,j,where i=1, . . . ,p, j=1, . . . ,m, m being the dimensionality of the output vectors;
all nodes in the network to which input vectors are presented being identified as input nodes denoted as
space="preserve" listing-type="equation">I.sub.1, . . . ,I.sub.nwhere n is the dimensionality of the input vectors;
all nodes in the network from which output vectors are to be extracted being identified as output nodes denoted as
space="preserve" listing-type="equation">ω
.sub.1, . . . ,ω
.sub.mwhere m is the dimensionality of the output vectors; and
the remaining nodes in the network being identified as hidden nodes denoted as
space="preserve" listing-type="equation">ε
.sub.1, . . . ,ε
.sub.t-(n+m)where t is the total number of nodes comprising the neural network;
said apparatus comprising;(a) means for associating with each hidden node ε
i a weight vector ui representing the strength of all synaptic connections leading to said hidden node ε
i, where i=1, . . . ,t-(n+m), and associating with every output node ω
i, a weight vector vi representing the strengths of all synaptic connections leading to said output node ω
i, where i=1, . . . ,m;
each hidden node ε
i having identified therewith a set of optimal direction vectors denoted as di,j where i=1, . . . ,t-(n+m), j=1, . . . ,ri, ri being the dimensionality of the weight vector ui associated with said hidden node ε
i and moreover being the number of nodes from which said hidden node ε
i receives inputs as well as being equal to the dimensionality of said direction vectors di,j, the concept of optimality of said vector di,j being defined in terms of an orthogonal direction along which the standard deviation of the projections of the inputs are essentially maximized, and said vectors di,j, being obtained as singular vectors of the input space for the hidden node ε
i ;(b) means for imposing a constraint on each weight vector ui which requires said weight vector to be aligned with a particular direction vector di,j(i), and sized by a variable scalar multiplier ci, said constraint being expressed by the equation
space="preserve" listing-type="equation">u.sub.i =c.sub.i d.sub.i,j(i),where i=1, . . . ,t-(n+m) and the index j(i) is selected by processes which operate by choosing a direction vector di,j(i) along which changes in the weight vector ui tend to most quickly decrease the deviations between the actual output vectors of the network measured at the output nodes ω
k where k=1, . . . ,m, and the desired output vectors as represented by said output matrix Y, said deviation being measured by processes exemplified by but not limited to the root means square measure of error, said root means square error being defined by the equation ##EQU4## where ai,j is the result of the propagation of input vector i applied to all input nodes simultaneously and the result propagated throughout the network to each output node ω
j, where i=1, . . . ,p, j=1, . . . ,m;(c) means for performing the Iterative Least Squares solution for the weight vector vi identified with each output node ω
i, where i=1, . . . ,m;(d) means for performing a numerical optimization of the scalar multipliers ci which determine the weights identified with each hidden node ε
1, where i=1, . . . ,t-(n+m), said optimization being performed in such a manner as to adjust the totality of all said multipliers ci so as to reduce deviation between the output values generated by propagating all inputs through the network to the final output nodes denoted ω
j, j=1, . . . ,m and the desired output values Yk,j, k=1, . . . ,p, j=1, . . . ,m;(e) means for evaluating the selection of the index j(i) associated with the direction vector di,j(i) at each hidden node ε
i, where i=1, . . . ,t-(n+m), so that said index may be replaced by a choice consistent with the conditions set forth in step (b) as effected by evolution of the network through the training process;(f) means for reconstructing the entire set of direction vectors di,j associated with hidden node ε
i ;(g) means for performing a repetition of steps (a), . . . ,(f) in such a manner as to effectively minimize deviations between the actual output vectors of the network and the desired output vectors, said deviations being dependent upon a specific implementation, but exemplified by the root mean squares measure of error. - View Dependent Claims (8, 9, 10)
Specification