Discriminative pretraining of deep neural networks
First Claim
1. A computer-implemented process for pretraining deep neural network (DNN), comprising:
- using a computer to perform the following process actions;
(a) training a single hidden layer neural network (NN) comprising an input layer into which training data is input, a multi-neuron output layer from which an output is generated, and a first multi-neuron hidden layer which is interconnected with the input and output layers with randomly initialized weights, said hidden layer having a fixed number of neurons, wherein said training comprises,accessing a set of training data entries, each data entry of which has a corresponding label assigned thereto,inputting each data entry of said set one by one into the input layer until all the data entries have been input at least once to produce an initial NN, such that after the inputting of each data entry, said weights associated with a first hidden layer are set via an error back propagation (BP) procedure so that the output generated from the multi-neuron output layer matches the label associated with the training data entry;
(b) discarding a current multi-neuron output layer and adding a new multi-neuron hidden layer which is interconnected with a last previously trained hidden layer and a new multi-neuron output layer with randomly initialized weights to produce a new multiple hidden layer deep neural network, said new hidden layer having a fixed number of neurons;
(c) inputting each data entry of said set one by one into the input layer until all the data entries have been input at least once to produce a revised multiple hidden layer deep neural network, such that after the inputting of each data entry, said weights associated with the new hidden layer and each previously trained hidden layer are set via the error BP procedure to produce an output from the new multi-neuron output layer that matches the label associated with the training data entry;
(d) repeating actions (b) and (c) until a prescribed number of hidden layers have been added;
(e) designating the last produced revised multiple layer DNN to be said pretrained DNN; and
(f) iteratively training the pretrained DNN to produce a trained DNN.
2 Assignments
0 Petitions
Accused Products
Abstract
Discriminative pretraining technique embodiments are presented that pretrain the hidden layers of a Deep Neural Network (DNN). In general, a one-hidden-layer neural network is trained first using labels discriminatively with error back-propagation (BP). Then, after discarding an output layer in the previous one-hidden-layer neural network, another randomly initialized hidden layer is added on top of the previously trained hidden layer along with a new output layer that represents the targets for classification or recognition. The resulting multiple-hidden-layer DNN is then discriminatively trained using the same strategy, and so on until the desired number of hidden layers is reached. This produces a pretrained DNN. The discriminative pretraining technique embodiments have the advantage of bringing the DNN layer weights close to a good local optimum, while still leaving them in a range with a high gradient so that they can be fine-tuned effectively.
66 Citations
20 Claims
-
1. A computer-implemented process for pretraining deep neural network (DNN), comprising:
using a computer to perform the following process actions; (a) training a single hidden layer neural network (NN) comprising an input layer into which training data is input, a multi-neuron output layer from which an output is generated, and a first multi-neuron hidden layer which is interconnected with the input and output layers with randomly initialized weights, said hidden layer having a fixed number of neurons, wherein said training comprises, accessing a set of training data entries, each data entry of which has a corresponding label assigned thereto, inputting each data entry of said set one by one into the input layer until all the data entries have been input at least once to produce an initial NN, such that after the inputting of each data entry, said weights associated with a first hidden layer are set via an error back propagation (BP) procedure so that the output generated from the multi-neuron output layer matches the label associated with the training data entry; (b) discarding a current multi-neuron output layer and adding a new multi-neuron hidden layer which is interconnected with a last previously trained hidden layer and a new multi-neuron output layer with randomly initialized weights to produce a new multiple hidden layer deep neural network, said new hidden layer having a fixed number of neurons; (c) inputting each data entry of said set one by one into the input layer until all the data entries have been input at least once to produce a revised multiple hidden layer deep neural network, such that after the inputting of each data entry, said weights associated with the new hidden layer and each previously trained hidden layer are set via the error BP procedure to produce an output from the new multi-neuron output layer that matches the label associated with the training data entry; (d) repeating actions (b) and (c) until a prescribed number of hidden layers have been added; (e) designating the last produced revised multiple layer DNN to be said pretrained DNN; and (f) iteratively training the pretrained DNN to produce a trained DNN. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A computer storage device having computer-executable instructions stored thereon for training a deep neural network (DNN), said computer-executable instructions comprising:
-
(a) training a single hidden layer neural network (NN) comprising an input layer into which training data is input, a multi-neuron output layer from which an output is generated, and a first multi-neuron hidden layer which is interconnected with the input and output layers with randomly initialized weights, said hidden layer having a fixed number of neurons, wherein said training comprises, accessing a set of training data entries, each data entry of which has a corresponding label assigned thereto, inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce an initial NN, such that after the inputting of each data entry, said weights associated with a first hidden layer are set via an error backpropagation procedure to produce an output from the multi-neuron output layer that matches the label associated with the training data entry; (b) discarding a current multi-neuron output layer and adding a new multi-neuron hidden layer which is interconnected with a last previously trained hidden layer and a new multi-neuron output layer with randomly initialized weights to produce a new multiple hidden layer deep neural network, said hidden layer having a fixed number of neurons; (c) training a last produced new multiple hidden layer deep neural network, wherein said training comprises, inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce a revised multiple hidden layer deep neural network, such that after the inputting of each data entry, said weights associated with the new hidden layer and each previously trained hidden layer are set via the error backpropagation procedure which employs said prescribed learning rate so that the output generated from the multi-neuron output layer matches the label associated with the training data entry; (d) repeating instructions (b) and (c) until a prescribed number of hidden layers have been added; (e) designating the last produced revised multiple layer DNN to be a pretrained DNN; and (f) iteratively training the pretrained DNN to produce a trained DNN. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A system for pretraining a deep neural network (DNN), comprising:
-
one or more computing devices, said computing devices being in communication with each other whenever there is a plurality of computing devices, and a computer program having a plurality of sub-programs executable by the one or more computing devices, the one or more computing devices being directed by the sub-programs of the computer program to, (a) train a single hidden layer neural network (NN) comprising an input layer into which training data is input, a multi-neuron output layer from which an output is generated, and a first multi-neuron hidden layer which is interconnected with the input and output layers with randomly initialized weights, said hidden layer having a fixed number of neurons, wherein said training comprises, accessing a set of training data entries, each data entry of which has a corresponding label assigned thereto, and inputting each data entry of said set one by one into the input layer until all the data entries have been input at least once to produce an initial NN, such that after the inputting of each data entry, said weights associated with a first hidden layer are set via an error back propagation (BP) procedure so that the output generated from the multi-neuron output layer matches the label associated with the training data entry, (b) discard a current multi-neuron output layer and add a new multi-neuron hidden layer which is interconnected with a last previously trained hidden layer and a new multi-neuron output layer with randomly initialized weights to produce a new multiple hidden layer deep neural network, said hidden layer having a fixed number of neurons, (c) input each data entry of said set one by one into the input layer until all the data entries have been input at least once to produce a revised multiple hidden layer deep neural network, such that after the inputting of each data entry, said weights associated with the new hidden layer and each previously trained hidden layer are set via the error BP procedure to produce an output from the new multi-neuron output layer that matches the label associated with the training data entry, repeat (b) and (c) until a prescribed number of hidden layers have been added, designate the last produced revised multiple layer DNN to be said pretrained DNN; and iteratively train the pretrained DNN to produce a trained DNN. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification