Discriminative pretraining of deep neural networks

US 10,325,200 B2
Filed: 10/01/2015
Issued: 06/18/2019
Est. Priority Date: 11/26/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented process for pretraining deep neural network (DNN), comprising:

using a computer to perform the following process actions;

(a) training a single hidden layer neural network (NN) comprising an input layer into which training data is input, a multi-neuron output layer from which an output is generated, and a first multi-neuron hidden layer which is interconnected with the input and output layers with randomly initialized weights, said hidden layer having a fixed number of neurons, wherein said training comprises,accessing a set of training data entries, each data entry of which has a corresponding label assigned thereto,inputting each data entry of said set one by one into the input layer until all the data entries have been input at least once to produce an initial NN, such that after the inputting of each data entry, said weights associated with a first hidden layer are set via an error back propagation (BP) procedure so that the output generated from the multi-neuron output layer matches the label associated with the training data entry;

(b) discarding a current multi-neuron output layer and adding a new multi-neuron hidden layer which is interconnected with a last previously trained hidden layer and a new multi-neuron output layer with randomly initialized weights to produce a new multiple hidden layer deep neural network, said new hidden layer having a fixed number of neurons;

(c) inputting each data entry of said set one by one into the input layer until all the data entries have been input at least once to produce a revised multiple hidden layer deep neural network, such that after the inputting of each data entry, said weights associated with the new hidden layer and each previously trained hidden layer are set via the error BP procedure to produce an output from the new multi-neuron output layer that matches the label associated with the training data entry;

(d) repeating actions (b) and (c) until a prescribed number of hidden layers have been added;

(e) designating the last produced revised multiple layer DNN to be said pretrained DNN; and

(f) iteratively training the pretrained DNN to produce a trained DNN.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Discriminative pretraining technique embodiments are presented that pretrain the hidden layers of a Deep Neural Network (DNN). In general, a one-hidden-layer neural network is trained first using labels discriminatively with error back-propagation (BP). Then, after discarding an output layer in the previous one-hidden-layer neural network, another randomly initialized hidden layer is added on top of the previously trained hidden layer along with a new output layer that represents the targets for classification or recognition. The resulting multiple-hidden-layer DNN is then discriminatively trained using the same strategy, and so on until the desired number of hidden layers is reached. This produces a pretrained DNN. The discriminative pretraining technique embodiments have the advantage of bringing the DNN layer weights close to a good local optimum, while still leaving them in a range with a high gradient so that they can be fine-tuned effectively.

66 Citations

View as Search Results

20 Claims

1. A computer-implemented process for pretraining deep neural network (DNN), comprising:
- using a computer to perform the following process actions;
  
  (a) training a single hidden layer neural network (NN) comprising an input layer into which training data is input, a multi-neuron output layer from which an output is generated, and a first multi-neuron hidden layer which is interconnected with the input and output layers with randomly initialized weights, said hidden layer having a fixed number of neurons, wherein said training comprises,accessing a set of training data entries, each data entry of which has a corresponding label assigned thereto,inputting each data entry of said set one by one into the input layer until all the data entries have been input at least once to produce an initial NN, such that after the inputting of each data entry, said weights associated with a first hidden layer are set via an error back propagation (BP) procedure so that the output generated from the multi-neuron output layer matches the label associated with the training data entry;
  
  (b) discarding a current multi-neuron output layer and adding a new multi-neuron hidden layer which is interconnected with a last previously trained hidden layer and a new multi-neuron output layer with randomly initialized weights to produce a new multiple hidden layer deep neural network, said new hidden layer having a fixed number of neurons;
  
  (c) inputting each data entry of said set one by one into the input layer until all the data entries have been input at least once to produce a revised multiple hidden layer deep neural network, such that after the inputting of each data entry, said weights associated with the new hidden layer and each previously trained hidden layer are set via the error BP procedure to produce an output from the new multi-neuron output layer that matches the label associated with the training data entry;
  
  (d) repeating actions (b) and (c) until a prescribed number of hidden layers have been added;
  
  (e) designating the last produced revised multiple layer DNN to be said pretrained DNN; and
  
  (f) iteratively training the pretrained DNN to produce a trained DNN.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The process of claim 1, wherein each output layer employed uses a softmax function to match its output to the label associated with a currently entered training data entry.
  - 3. The process of claim 1, wherein the process action of accessing the set of training data entries, each data entry of which has the corresponding label assigned thereto, comprises accessing a set of speech frames each of which corresponds to a senone label.
  - 4. The process of claim 1, wherein the process action of inputting each data entry of said set one by one into the input layer until all the data entries have been input at least once to produce an initial deep neural network, comprises inputting each data entry of the set just once.
  - 5. The process of claim 1, wherein the process action of inputting each data entry of said set one by one into the input layer until all the data entries have been input at least once to produce the revised multiple hidden layer deep neural network, comprises inputting each data entry of the set just once.
  - 6. The process of claim 1, wherein the error BP procedure used to set the weights associated with the first hidden layer employs a prescribed learning rate that ranges between 0.01 and 0.20.
  - 7. The process of claim 1, wherein the error BP procedure used to set the weights associated with each new hidden layer and each previously trained hidden layer employs a prescribed learning rate that ranges between 0.01 and 0.20.

8. A computer storage device having computer-executable instructions stored thereon for training a deep neural network (DNN), said computer-executable instructions comprising:
- (a) training a single hidden layer neural network (NN) comprising an input layer into which training data is input, a multi-neuron output layer from which an output is generated, and a first multi-neuron hidden layer which is interconnected with the input and output layers with randomly initialized weights, said hidden layer having a fixed number of neurons, wherein said training comprises,accessing a set of training data entries, each data entry of which has a corresponding label assigned thereto,inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce an initial NN, such that after the inputting of each data entry, said weights associated with a first hidden layer are set via an error backpropagation procedure to produce an output from the multi-neuron output layer that matches the label associated with the training data entry;
  
  (b) discarding a current multi-neuron output layer and adding a new multi-neuron hidden layer which is interconnected with a last previously trained hidden layer and a new multi-neuron output layer with randomly initialized weights to produce a new multiple hidden layer deep neural network, said hidden layer having a fixed number of neurons;
  
  (c) training a last produced new multiple hidden layer deep neural network, wherein said training comprises, inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce a revised multiple hidden layer deep neural network, such that after the inputting of each data entry, said weights associated with the new hidden layer and each previously trained hidden layer are set via the error backpropagation procedure which employs said prescribed learning rate so that the output generated from the multi-neuron output layer matches the label associated with the training data entry;
  
  (d) repeating instructions (b) and (c) until a prescribed number of hidden layers have been added;
  
  (e) designating the last produced revised multiple layer DNN to be a pretrained DNN; and
  
  (f) iteratively training the pretrained DNN to produce a trained DNN.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The computer storage device of claim 8, wherein the instruction for training the single hidden layer NN comprises each output layer employing a softmax function to match its output to the label associated with a currently entered training data entry.
  - 10. The computer storage device of claim 8, wherein the instruction for training the last produced new multiple hidden layer deep neural network comprises each output layer employing a softmax function to match its output to the label associated with a currently entered training data entry.
  - 11. The computer storage device of claim 8, wherein the instruction for accessing the set of training data entries, each data entry of which has the corresponding label assigned thereto, comprises accessing a set of speech frames each of which corresponds to a senone label.
  - 12. The computer storage device of claim 8, further comprising an instruction for iteratively training the pretrained DNN a prescribed number of times to produce said trained DNN, wherein each training iteration comprises inputting each data entry of a set of training data entries one by one into the input layer until all the data entries have been input once to produce a new fine-tuned version of the pretrained DNN, such that after the inputting of each data entry, said weights associated with the hidden layers are set via the error backpropagation procedure to produce an output from the output layer that matches the label associated with the training data entry.
  - 13. The computer storage device of claim 12, wherein the instruction for iteratively training the pretrained DNN the prescribed number of times to produce said trained DNN, comprises training the pretrained DNN four times to produce said trained DNN.

14. A system for pretraining a deep neural network (DNN), comprising:
- one or more computing devices, said computing devices being in communication with each other whenever there is a plurality of computing devices, anda computer program having a plurality of sub-programs executable by the one or more computing devices, the one or more computing devices being directed by the sub-programs of the computer program to,(a) train a single hidden layer neural network (NN) comprising an input layer into which training data is input, a multi-neuron output layer from which an output is generated, and a first multi-neuron hidden layer which is interconnected with the input and output layers with randomly initialized weights, said hidden layer having a fixed number of neurons, wherein said training comprises,accessing a set of training data entries, each data entry of which has a corresponding label assigned thereto, andinputting each data entry of said set one by one into the input layer until all the data entries have been input at least once to produce an initial NN, such that after the inputting of each data entry, said weights associated with a first hidden layer are set via an error back propagation (BP) procedure so that the output generated from the multi-neuron output layer matches the label associated with the training data entry,(b) discard a current multi-neuron output layer and add a new multi-neuron hidden layer which is interconnected with a last previously trained hidden layer and a new multi-neuron output layer with randomly initialized weights to produce a new multiple hidden layer deep neural network, said hidden layer having a fixed number of neurons,(c) input each data entry of said set one by one into the input layer until all the data entries have been input at least once to produce a revised multiple hidden layer deep neural network, such that after the inputting of each data entry, said weights associated with the new hidden layer and each previously trained hidden layer are set via the error BP procedure to produce an output from the new multi-neuron output layer that matches the label associated with the training data entry,repeat (b) and (c) until a prescribed number of hidden layers have been added,designate the last produced revised multiple layer DNN to be said pretrained DNN; and
  
  iteratively train the pretrained DNN to produce a trained DNN.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The system of claim 14, wherein each output layer employed uses a softmax function to match its output to the label associated with a currently entered training data entry.
  - 16. The system of claim 14, wherein the sub-program for accessing the set of training data entries, each data entry of which has the corresponding label assigned thereto, comprises accessing a set of speech frames each of which corresponds to a senone label.
  - 17. The system of claim 14, wherein the sub-program for inputting each data entry of said set one by one into the input layer until all the data entries have been input at least once to produce an initial deep neural network, comprises inputting each data entry of the set just once.
  - 18. The system of claim 14, wherein the sub-program for inputting each data entry of said set one by one into the input layer until all the data entries have been input at least once to produce the revised multiple hidden layer deep neural network, comprises inputting each data entry of the set just once.
  - 19. The system of claim 14, wherein the error BP procedure used to set the weights associated with the first hidden layer employs a prescribed learning rate that ranges between 0.01 and 0.20.
  - 20. The system of claim 14, wherein the error BP procedure used to set the weights associated with each new hidden layer and each previously trained hidden layer employs a prescribed learning rate that ranges between 0.01 and 0.20.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Yu, Dong, Deng, Li, Seide, Frank Torsten Bernd, Li, Gang
Primary Examiner(s)
Vo, Huyen X

Application Number

US14/873,166
Publication Number

US 20160026914A1
Time in Patent Office

1,356 Days
Field of Search

704 1- 10, 704201, 704231-2701
US Class Current
CPC Class Codes

G06N 3/04 Architecture, e.g. intercon...

G06N 3/08 Learning methods

Discriminative pretraining of deep neural networks

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

66 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Discriminative pretraining of deep neural networks

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

66 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links