System and method for addressing overfitting in a neural network
First Claim
Patent Images
1. A computer-implemented method comprising:
- obtaining a plurality of training cases; and
training a neural network having a plurality of layers on the plurality of training cases, each of the layers including one or more feature detectors, each of the feature detectors having a corresponding set of weights, and a subset of the feature detectors being associated with respective probabilities of being disabled during processing of each of the training cases, wherein training the neural network on the plurality of training cases comprises, for each of the training cases respectively;
determining one or more feature detectors to disable during processing of the training case, comprising determining whether to disable each of the feature detectors in the subset based on the respective probability associated with the feature detector,disabling the one or more feature detectors in accordance with the determining, andprocessing the training case using the neural network with the one or more feature detectors disabled to generate a predicted output for the training case.
5 Assignments
0 Petitions
Accused Products
Abstract
A system for training a neural network. A switch is linked to feature detectors in at least some of the layers of the neural network. For each training case, the switch randomly selectively disables each of the feature detectors in accordance with a preconfigured probability. The weights from each training case are then normalized for applying the neural network to test data.
14 Citations
24 Claims
-
1. A computer-implemented method comprising:
-
obtaining a plurality of training cases; and training a neural network having a plurality of layers on the plurality of training cases, each of the layers including one or more feature detectors, each of the feature detectors having a corresponding set of weights, and a subset of the feature detectors being associated with respective probabilities of being disabled during processing of each of the training cases, wherein training the neural network on the plurality of training cases comprises, for each of the training cases respectively; determining one or more feature detectors to disable during processing of the training case, comprising determining whether to disable each of the feature detectors in the subset based on the respective probability associated with the feature detector, disabling the one or more feature detectors in accordance with the determining, and processing the training case using the neural network with the one or more feature detectors disabled to generate a predicted output for the training case. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising:
-
obtaining a plurality of training cases; and training a neural network having a plurality of layers on the plurality of training cases, each of the layers including one or more feature detectors, each of the feature detectors having a corresponding set of weights, and a subset of the feature detectors being associated with respective probabilities of being disabled during processing of each of the training cases, wherein training the neural network on the plurality of training cases comprises, for each of the training cases respectively; determining one or more feature detectors to disable during processing of the training case, comprising determining whether to disable each of the feature detectors in the subset based on the respective probability associated with the feature detector, disabling the one or more feature detectors in accordance with the determining, and processing the training case using the neural network with the one or more feature detectors disabled to generate a predicted output for the training case. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
obtaining a plurality of training cases; and training a neural network having a plurality of layers on the plurality of training cases, each of the layers including one or more feature detectors, each of the feature detectors having a corresponding set of weights, and a subset of the feature detectors being associated with respective probabilities of being disabled during processing of each of the training cases, wherein training the neural network on the plurality of training cases comprises, for each of the training cases respectively; determining one or more feature detectors to disable during processing of the training case, comprising determining whether to disable each of the feature detectors in the subset based on the respective probability associated with the feature detector, disabling the one or more feature detectors in accordance with the determining, and processing the training case using the neural network with the one or more feature detectors disabled to generate a predicted output for the training case. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification