Exploiting sparseness in training deep neural networks
First Claim
1. A computer-implemented process for training a deep neural network (DNN), comprising:
- using a computer to perform the following process actions;
(a) initially training a fully interconnected DNN comprising an input layer into which training data is input, an output layer from which an output is generated, and a plurality of hidden layers, wherein said training comprises,(i) accessing a set of training data entries,(ii) inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce an interimly trained DNN, such that after the inputting of each data entry, a value of each weight associated with each interconnection of each hidden layer are set via an error back-propagation procedure so that the output from the output layer matches a label assigned to the training data entry,(iii) repeating actions (i) and (ii) a number of times to establish an initially trained DNN;
(b) identifying each interconnection associated with each layer of the initially trained DNN whose interconnection weight value does not exceed a first weight threshold;
(c) setting the value of each of identified interconnection to zero;
(d) inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce a current refined DNN, such that after the inputting of each data entry, the values of the weights associated with the interconnections of each hidden layer are set via an error back-propagation procedure so that the output from the output layer matches the label assigned to the training data entry;
(e) identifying those interconnections associated with each hidden layer of the last produced refined DNN whose interconnection weight value does not exceed a second weight threshold;
(f) setting the value of each of the identified interconnections whose interconnection weight value does not exceed the second weight threshold to zero; and
(g) repeating actions (d) through (f) a number of times to produce said trained DNN.
2 Assignments
0 Petitions
Accused Products
Abstract
Deep Neural Network (DNN) training technique embodiments are presented that train a DNN while exploiting the sparseness of non-zero hidden layer interconnection weight values. Generally, a fully connected DNN is initially trained by sweeping through a full training set a number of times. Then, for the most part, only the interconnections whose weight magnitudes exceed a minimum weight threshold are considered in further training. This minimum weight threshold can be established as a value that results in only a prescribed maximum number of interconnections being considered when setting interconnection weight values via an error back-propagation procedure during the training. It is noted that the continued DNN training tends to converge much faster than the initial training.
40 Citations
20 Claims
-
1. A computer-implemented process for training a deep neural network (DNN), comprising:
-
using a computer to perform the following process actions; (a) initially training a fully interconnected DNN comprising an input layer into which training data is input, an output layer from which an output is generated, and a plurality of hidden layers, wherein said training comprises, (i) accessing a set of training data entries, (ii) inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce an interimly trained DNN, such that after the inputting of each data entry, a value of each weight associated with each interconnection of each hidden layer are set via an error back-propagation procedure so that the output from the output layer matches a label assigned to the training data entry, (iii) repeating actions (i) and (ii) a number of times to establish an initially trained DNN; (b) identifying each interconnection associated with each layer of the initially trained DNN whose interconnection weight value does not exceed a first weight threshold; (c) setting the value of each of identified interconnection to zero; (d) inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce a current refined DNN, such that after the inputting of each data entry, the values of the weights associated with the interconnections of each hidden layer are set via an error back-propagation procedure so that the output from the output layer matches the label assigned to the training data entry; (e) identifying those interconnections associated with each hidden layer of the last produced refined DNN whose interconnection weight value does not exceed a second weight threshold; (f) setting the value of each of the identified interconnections whose interconnection weight value does not exceed the second weight threshold to zero; and (g) repeating actions (d) through (f) a number of times to produce said trained DNN. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-implemented process for training a deep neural network (DNN), comprising:
-
using a computer to perform the following process actions; (a) initially training a fully interconnected DNN comprising an input layer into which training data is input, an output layer from which an output is generated, and a plurality of hidden layers, wherein said training comprises, (i) accessing a set of training data entries, (ii) inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce an interimly trained DNN, such that after the inputting of each data entry, a value of each weight associated with each interconnection of each hidden layer are set via an error back-propagation procedure so that the output from the output layer matches a label assigned to the training data entry, (iii) repeating actions (i) and (ii) a number of times to establish an initially trained DNN; (b) identifying those interconnections associated with each layer of the initially trained DNN whose current weight value exceeds a minimum weight threshold; (c) inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce a refined DNN, such that after the inputting of each data entry, the value of each weight associated with each of the identified interconnections of each hidden layer is set via an error back-propagation procedure so that the output from the output layer matches the label assigned to the training data entry; and (d) repeating action (c) a number of times to produce said trained DNN. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer storage medium for storing data for access by a deep neural network (DNN) training application program being executed on a computer, comprising:
-
a data structure stored in said storage medium, said data structure comprising information used by said DNN training application program, said information representing a weight matrix having a plurality of columns and rows of weight values associated with interconnections between a pair of layers of the DNN, said data structure comprising; a header data structure element comprising, a total columns number representing the number of columns of said weight matrix, followed by, a series of column index numbers each of which identifies a location in the data structure where information corresponding to a different one of the plurality of weight matrix columns begins; and a plurality of column data structure elements each of which comprises information corresponding to a different one of the plurality of weight matrix columns, each of said column data structure elements comprising, a total non-zero weight value number representing the number of non-zero weight values in the column data structure element, followed by, a series of row identification numbers each of which identifies a row of the column of the weight matrix corresponding to the column data structure element that is associated with a non-zero weight value, followed by, a series of non-zero weight values each of which is assigned to a different one of the rows of the column of the weight matrix corresponding to the column data structure element that is associated with a non-zero weight value; and
whereinsaid computer storage media consists of at least one of DVD'"'"'s, or CD'"'"'s, or floppy disks, or tape drives, or hard drives, or optical drives, or solid state memory devices, or RAM, or ROM, or EEPROM, or flash memory, or magnetic cassettes, or magnetic tapes, or magnetic disk storage, or other magnetic storage devices.
-
Specification