Batch normalization layers
First Claim
1. A neural network system implemented by one or more computers, the neural network system comprising:
- instructions for implementing a batch normalization layer between a first neural network layer and a second neural network layer in a neural network, wherein the first neural network layer generates first layer outputs having a plurality of components, and wherein the instructions cause the one or more computers to perform operations comprising;
during training of the neural network on a plurality of batches of training data, each batch comprising a respective plurality of training examples and for each of the batches;
receiving a respective first layer output for each of the plurality of training examples in the batch;
computing a plurality of normalization statistics for the batch from the first layer outputs, comprising;
determining, for each of a plurality of subsets of the plurality of the components of the first layer outputs, a mean of the components of the first layer outputs for each of the plurality of training examples in the batch that are in the respective subset, anddetermining, for each of the plurality of subsets of the plurality of the components of the first layer outputs, a standard deviation of the components of the first layer outputs for each of the plurality of training examples in the batch that are in the respective subset;
normalizing each of the plurality of the components of each first layer output using the normalization statistics to generate a respective normalized layer output for each training example in the batch, comprising;
for each first layer output and for each of the plurality of subsets, normalizing the components of the first layer output that are in the respective subset using the mean for the respective subset and the standard deviation for the respective subset;
generating a respective batch normalization layer output for each of the training examples from the normalized layer outputs; and
providing the batch normalization layer output as an input to the second neural network layer.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing inputs using a neural network system that includes a batch normalization layer. One of the methods includes receiving a respective first layer output for each training example in the batch; computing a plurality of normalization statistics for the batch from the first layer outputs; normalizing each component of each first layer output using the normalization statistics to generate a respective normalized layer output for each training example in the batch; generating a respective batch normalization layer output for each of the training examples from the normalized layer outputs; and providing the batch normalization layer output as an input to the second neural network layer.
-
Citations
34 Claims
-
1. A neural network system implemented by one or more computers, the neural network system comprising:
instructions for implementing a batch normalization layer between a first neural network layer and a second neural network layer in a neural network, wherein the first neural network layer generates first layer outputs having a plurality of components, and wherein the instructions cause the one or more computers to perform operations comprising; during training of the neural network on a plurality of batches of training data, each batch comprising a respective plurality of training examples and for each of the batches; receiving a respective first layer output for each of the plurality of training examples in the batch; computing a plurality of normalization statistics for the batch from the first layer outputs, comprising; determining, for each of a plurality of subsets of the plurality of the components of the first layer outputs, a mean of the components of the first layer outputs for each of the plurality of training examples in the batch that are in the respective subset, and determining, for each of the plurality of subsets of the plurality of the components of the first layer outputs, a standard deviation of the components of the first layer outputs for each of the plurality of training examples in the batch that are in the respective subset; normalizing each of the plurality of the components of each first layer output using the normalization statistics to generate a respective normalized layer output for each training example in the batch, comprising; for each first layer output and for each of the plurality of subsets, normalizing the components of the first layer output that are in the respective subset using the mean for the respective subset and the standard deviation for the respective subset; generating a respective batch normalization layer output for each of the training examples from the normalized layer outputs; and providing the batch normalization layer output as an input to the second neural network layer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
21. A method performed by one or more computers implementing a batch normalization layer that is between a first neural network layer and a second neural network layer in a neural network, wherein the first neural network layer generates first layer outputs having a plurality of components, and wherein the method comprises;
during training of the neural network on a plurality of batches of training data, each batch comprising a respective plurality of training examples and for each of the batches; receiving a respective first layer output for each of the plurality of training examples in the batch; computing a plurality of normalization statistics for the batch from the first layer outputs, comprising; determining, for each of a plurality of subsets of the plurality of the components of the first layer outputs, a mean of the components of the first layer outputs for each of the plurality of training examples in the batch that are in the respective subset, and determining, for each of the plurality of subsets of the plurality of the components of the first layer outputs, a standard deviation of the components of the first layer outputs for each of the plurality of training examples in the batch that are in the respective subset; normalizing each of the plurality of the components of each first layer output using the normalization statistics to generate a respective normalized layer output for each training example in the batch, comprising; for each first layer output and for each of the plurality of subsets, normalizing the components of the first layer output that are in the respective subset using the mean for the respective subset and the standard deviation for the respective subset; generating a respective batch normalization layer output for each of the training examples from the normalized layer outputs; and providing the batch normalization layer output as an input to the second neural network layer. - View Dependent Claims (29, 30, 31, 32, 33, 34)
-
22. One or more non-transitory computer-readable storage media encoded with a computer program, the computer program comprising instructions that when executed by one or more computers cause the one or more computers to implement a neural network system, the neural network system comprising:
batch normalization instructions for implementing a batch normalization layer between a first neural network layer and a second neural network layer in a neural network, wherein the first neural network layer generates first layer outputs having a plurality of components, and wherein the batch normalization instructions cause the one or more computers to perform operations comprising; during training of the neural network on a plurality of batches of training data, each batch comprising a respective plurality of training examples and for each of the batches; receiving a respective first layer output for each of the plurality of training examples in the batch; computing a plurality of normalization statistics for the batch from the first layer outputs, comprising; determining, for each of a plurality of subsets of the plurality of the components of the first layer outputs, a mean of the components of the first layer outputs for each of the plurality of training examples in the batch that are in the respective subset, and determining, for each of the plurality of subsets of the plurality of the components of the first layer outputs, a standard deviation of the components of the first layer outputs for each of the plurality of training examples in the batch that are in the respective subset; normalizing each of the plurality of the components of each first layer output using the normalization statistics to generate a respective normalized layer output for each training example in the batch, comprising; for each first layer output and for each of the plurality of subsets, normalizing the components of the first layer output that are in the respective subset using the mean for the respective subset and the standard deviation for the respective subset; generating a respective batch normalization layer output for each of the training examples from the normalized layer outputs; and providing the batch normalization layer output as an input to the second neural network layer. - View Dependent Claims (23, 24, 25, 26, 27, 28)
Specification