Method and system for training an artificial neural network
First Claim
1. In a training system for a neural network, wherein the neural network has an input layer and an output layer, the training system training the neural network by performing a method comprising the steps of:
- initializing the neural network with one or more weights of random value;
setting an adaptive learning rate to an initial value;
storing an input layer training pattern and an output layer training pattern;
processing the input layer training pattern in the neural network to obtain an output pattern;
calculating an error between the output layer training pattern and the output pattern;
if at least two output patterns have been obtained, calculating a new value for the adaptive learning rate, further comprising;
calculating an error ratio to determine the change in error between training iterations;
if the error ratio is less than a threshold value, multiplying the adaptive learning rate by a step-up factor; and
if the error ratio is greater than the threshold value, multiplying the adaptive learning rate by a step-down factor; and
if a final trained state is achieved, deploying the neural network, otherwise, repeating steps (d)-(f) for as many iterations as necessary to reach the final trained state.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for training an artificial neural network (“ANN”) are disclosed. One embodiment of the method of the present invention initializes an artificial neural network by assigning values to one or more weights. An adaptive learning rate is set to an initial starting value and training patterns for an input layer and an output layer are stored. The input layer training pattern is processed in the ANN to obtain an output pattern. An error is calculated between the output layer training pattern and the output pattern and used to calculate an error ratio, which is used to adjust the value of the adaptive learning rate. If the error ratio is less than a threshold value, the adaptive learning rate can be multiplied by a step-up factor to increase the learning rate. If the error ratio is greater than the threshold value, the adaptive learning rate can be multiplied by a step-down factor to reduce the learning rate. The value of the weights used to initialize the ANN are adjusted based on the calculated error and the adaptive learning rate. The training method of the present invention is repeated until ANN achieves a final trained state.
228 Citations
20 Claims
-
1. In a training system for a neural network, wherein the neural network has an input layer and an output layer, the training system training the neural network by performing a method comprising the steps of:
-
initializing the neural network with one or more weights of random value;
setting an adaptive learning rate to an initial value;
storing an input layer training pattern and an output layer training pattern;
processing the input layer training pattern in the neural network to obtain an output pattern;
calculating an error between the output layer training pattern and the output pattern;
if at least two output patterns have been obtained, calculating a new value for the adaptive learning rate, further comprising;
calculating an error ratio to determine the change in error between training iterations;
if the error ratio is less than a threshold value, multiplying the adaptive learning rate by a step-up factor; and
if the error ratio is greater than the threshold value, multiplying the adaptive learning rate by a step-down factor; and
if a final trained state is achieved, deploying the neural network, otherwise, repeating steps (d)-(f) for as many iterations as necessary to reach the final trained state. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. In a deployed neural network having an input layer and an output layer, a method for training the deployed neural network prior to deployment comprising the steps of:
-
initializing the neural network with one or more weights of random value;
setting an adaptive learning rate to an initial value;
storing an input layer training pattern and an output layer training pattern;
processing the input layer training pattern in the neural network using a gradient descent training algorithm with the adaptive learning rate to obtain an output pattern;
calculating an error between the output layer training pattern and the output pattern;
if at least two output patterns have been obtained, adjusting the adaptive learning rate based on the change in error between iterations;
if the adaptive learning rate equals a reset threshold, reprocessing the input layer training pattern as in step (d), further comprising;
setting the adaptive learning rate to a new initial value;
generating a plurality of prospective neural networks;
initializing each of the plurality of prospective neural networks with a different only one set of weights from a plurality of sets of weights; and
processing the training pattern in each of the plurality of prospective neural networks for a preset number of iterations using the gradient descent training algorithm with the new value for the adaptive learning rate to determine if the error decreases;
replacing the neural network with a most accurate of said each of the plurality of prospective neural networks;
if the adaptive learning again equals the reset threshold, then, if steps (g) and (h) have occurred a preset number of times, assigning new random values to the one or more weights to reinitialize the neural network and repeating steps (b)-(i), otherwise, increasing the number of said prospective neural networks and repeating steps (g) and (h); and
if a final trained state is achieved, deploying the neural network, otherwise, repeating steps (d)-(i) for as many iterations as necessary to reach the final trained state. - View Dependent Claims (9, 10, 11, 12, 13)
calculating an error ratio to determine the change in error between training iterations; if the error ratio is less than a threshold value, multiplying the adaptive learning rate by a step-up factor; and
if the error ratio is greater than the threshold value, multiplying the adaptive learning rate by a step-down factor.
-
-
10. The method of claim 8, wherein the plurality of sets of weights in step (g) is created by adding a small random change to the current values of the one or more weights such that each set of weights in said plurality of weights is different.
-
11. The method of claim 10, wherein the adaptive learning rate and the one or more weights are adjusted in real-time.
-
12. The method of claim 10, wherein said small random change are sampled from a zero-mean, small-variance, multi-variate gaussian distribution.
-
13. The method of claim 12, wherein said zero-mean, small-variance, multi-variate gaussian distribution is a wider distribution for each subsequent increase in the number of said prospective neural networks in step (i).
-
14. A method for optimizing during training the architecture of a deployed neural network prior to deployment, the deployed neural network having an input layer, an output layer, and at least one intermediate hidden layer, and wherein each layer contains at least one node, comprising the steps of:
-
initializing the neural network with one or more weights of random value;
training the neural network with a training algorithm;
if the training algorithm cannot achieve a preset error goal, increasing the size of the artificial neural network to decrease prediction error, further comprising;
adding a plurality of new intermediate hidden layers if a threshold number of new nodes have already been added to one or more of the at least one intermediate hidden layers;
if a threshold number of new nodes have not been added to the at least one intermediate hidden layers, adding a plurality of new nodes to one of the at least one intermediate hidden layers;
repeating step (b) from the pre-addition state after each plurality of new nodes or the plurality of new intermediate hidden layers is added to determine which of the plurality of new nodes or which of the plurality of new intermediate hidden layers provided the greatest error decrease;
eliminating all others of the plurality of new nodes or the plurality of new intermediate hidden layers added except for the one new node or new intermediate hidden layer that provided the greatest error decrease;
if a final error goal is not achieved, repeating steps (b)-(e) from the pre-elimination stage for as many iterations as necessary to achieve the final error goal; and
deploying the neural network. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification