Discriminative pretraining of deep neural networks

US 9,235,799 B2
Filed: 11/26/2011
Issued: 01/12/2016
Est. Priority Date: 11/26/2011
Status: Active Grant

First Claim

Patent Images

1. A system for training a context-dependent deep neural network (CD-DNN), comprising:

a computing device;

a computer program comprising program modules executable by the computing device, comprising,a hidden layer generator program module wherein the computing device is directed by the hidden layer generator program module to,initially generate a single hidden layer neural network comprising an input layer into which training data is input, an output layer from which an output is generated, and a first hidden layer which is interconnected with the input and output layers with randomly initialized weights,whenever a pretrained version of the single hidden layer neural network is produced, discard the current output layer and add a new hidden layer which is interconnected with the first hidden layer and a new output layer with randomly initialized weights to produce a multiple hidden layer deep neural network, andwhenever a pretrained version of a last produced multiple hidden layer deep neural network is produced and is designated as lacking a prescribed number of hidden layers, discard the current output layer and add a new hidden layer which is interconnected with the last previously added hidden layer and a new output layer with randomly initialized weights to produce a new multiple hidden layer deep neural network,a pretraining program module wherein the computing device is directed by the pretraining program module to,access a set of training data entries, each data entry of which has a corresponding label assigned thereto,access the single hidden layer neural network once it is generated,input each data entry of said set one by one into the input layer of the single hidden layer neural network until all the data entries have been input at least once to produce the pretrained version of the single hidden layer neural network, such that after the inputting of each data entry, said weights associated with the first hidden layer are set via an error backpropagation procedure to produce an output from the output layer that matches the label associated with the training data entry;

access each multiple hidden layer deep neural network at the time it is produced,for each multiple hidden layer deep neural network accessed, input each data entry of said set of training data entries one by one into the input layer until all the data entries have been input at least once to produce a pretrained version of the accessed multiple hidden layer deep neural network, such that after the inputting of each data entry, said weights associated with the last added hidden layer and each previously trained hidden layer are set via the error back propagation (BP) procedure to produce an output from the output layer that matches the label associated with the training data entry, anda DNN module wherein the computing device is directed by the DNN module to,each time a pretrained version of a multiple hidden layer DNN is produced, determining whether it includes said prescribed number of hidden layers, andwhenever it is determined the last produced pretrained multiple hidden layer deep neural network does not include the prescribed number of hidden layers, designating it as lacking the prescribed number of hidden layers, andwhenever it is determined the last produced pretrained multiple hidden layer deep neural network does include the prescribed number of hidden layers, designating it to be a pretrained DNN.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Discriminative pretraining technique embodiments are presented that pretrain the hidden layers of a Deep Neural Network (DNN). In general, a one-hidden-layer neural network is trained first using labels discriminatively with error back-propagation (BP). Then, after discarding an output layer in the previous one-hidden-layer neural network, another randomly initialized hidden layer is added on top of the previously trained hidden layer along with a new output layer that represents the targets for classification or recognition. The resulting multiple-hidden-layer DNN is then discriminatively trained using the same strategy, and so on until the desired number of hidden layers is reached. This produces a pretrained DNN. The discriminative pretraining technique embodiments have the advantage of bringing the DNN layer weights close to a good local optimum, while still leaving them in a range with a high gradient so that they can be fine-tuned effectively.

Citations

7 Claims

1. A system for training a context-dependent deep neural network (CD-DNN), comprising:
- a computing device;
  
  a computer program comprising program modules executable by the computing device, comprising,a hidden layer generator program module wherein the computing device is directed by the hidden layer generator program module to,initially generate a single hidden layer neural network comprising an input layer into which training data is input, an output layer from which an output is generated, and a first hidden layer which is interconnected with the input and output layers with randomly initialized weights,whenever a pretrained version of the single hidden layer neural network is produced, discard the current output layer and add a new hidden layer which is interconnected with the first hidden layer and a new output layer with randomly initialized weights to produce a multiple hidden layer deep neural network, andwhenever a pretrained version of a last produced multiple hidden layer deep neural network is produced and is designated as lacking a prescribed number of hidden layers, discard the current output layer and add a new hidden layer which is interconnected with the last previously added hidden layer and a new output layer with randomly initialized weights to produce a new multiple hidden layer deep neural network,a pretraining program module wherein the computing device is directed by the pretraining program module to,access a set of training data entries, each data entry of which has a corresponding label assigned thereto,access the single hidden layer neural network once it is generated,input each data entry of said set one by one into the input layer of the single hidden layer neural network until all the data entries have been input at least once to produce the pretrained version of the single hidden layer neural network, such that after the inputting of each data entry, said weights associated with the first hidden layer are set via an error backpropagation procedure to produce an output from the output layer that matches the label associated with the training data entry;
  
  access each multiple hidden layer deep neural network at the time it is produced,for each multiple hidden layer deep neural network accessed, input each data entry of said set of training data entries one by one into the input layer until all the data entries have been input at least once to produce a pretrained version of the accessed multiple hidden layer deep neural network, such that after the inputting of each data entry, said weights associated with the last added hidden layer and each previously trained hidden layer are set via the error back propagation (BP) procedure to produce an output from the output layer that matches the label associated with the training data entry, anda DNN module wherein the computing device is directed by the DNN module to,each time a pretrained version of a multiple hidden layer DNN is produced, determining whether it includes said prescribed number of hidden layers, andwhenever it is determined the last produced pretrained multiple hidden layer deep neural network does not include the prescribed number of hidden layers, designating it as lacking the prescribed number of hidden layers, andwhenever it is determined the last produced pretrained multiple hidden layer deep neural network does include the prescribed number of hidden layers, designating it to be a pretrained DNN.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system of claim 1, further comprising a fine-tuning module wherein the computing device is directed by the fine-tuning module to iteratively train the pretrained DNN until the weights associated with the each hidden layer do not vary between iterations by more than a prescribed training threshold to produce said trained DNN, wherein each training iteration comprises inputting each data entry of said set of training data entries one by one into the input layer until all the data entries have been input once to produce a new fine-tuned version of the pretrained DNN, such that after the inputting of each data entry, said weights associated with the hidden layers are set via the error back-propagation procedure so that the output generated from the output layer matches the label associated with the training data entry.
  - 3. The system of claim 1, wherein said accessing of a set of training data entries, each data entry of which has a corresponding label assigned thereto, comprises accessing a set of speech frames each of which corresponds to a senone label.
  - 4. The system of claim 1, wherein the pretraining program module directs the computing device to input each data entry of said set one by one into the input layer of the single hidden layer neural network until all the data entries have been input only once to produce the pretrained version of the single hidden layer DNN.
  - 5. The system of claim 1, wherein the pretraining program module directs the computing device to, for each multiple hidden layer DNN accessed, input each data entry of said set of training data entries one by one into the input layer until all the data entries have been input only once to produce a pretrained version of the accessed multiple hidden layer DNN.
  - 6. The system of claim 1, wherein the pretraining program module directs the computing device to set said weights associated with the first hidden layer via an error back propagation procedure employing a prescribed learning rate that ranges between 0.01 and 0.20 to produce an output from the output layer that matches the label associated with the training data entry.
  - 7. The system of claim 1, wherein the pretraining program module directs the computing device to set said weights associated with the last added hidden layer and each previously trained hidden layer via an error BP procedure employing a prescribed learning rate that ranges between 0.01 and 0.20 to produce an output from the output layer that matches the label associated with the training data entry.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Yu, Dong, Deng, Li, Seide, Frank Torsten Bernd, Li, Gang
Primary Examiner(s)
Vo, Huyen

Application Number

US13/304,643
Publication Number

US 20130138436A1
Time in Patent Office

1,508 Days
Field of Search

704/9, 704/251, 704/255, 704/232, 704/256, 704 1- 10, 704/250, 704/257, 704/270, 704/259, 706/16, 706/20, 600/300
US Class Current

1/1
CPC Class Codes

G06N 3/04 Architecture, e.g. intercon...

G06N 3/08 Learning methods

Discriminative pretraining of deep neural networks

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

Discriminative pretraining of deep neural networks

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links