Exploiting sparseness in training deep neural networks

US 8,700,552 B2
Filed: 11/28/2011
Issued: 04/15/2014
Est. Priority Date: 11/28/2011
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented process for training a deep neural network (DNN), comprising:

using a computer to perform the following process actions;

(a) initially training a fully interconnected DNN comprising an input layer into which training data is input, an output layer from which an output is generated, and a plurality of hidden layers, wherein said training comprises,(i) accessing a set of training data entries,(ii) inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce an interimly trained DNN, such that after the inputting of each data entry, a value of each weight associated with each interconnection of each hidden layer are set via an error back-propagation procedure so that the output from the output layer matches a label assigned to the training data entry,(iii) repeating actions (i) and (ii) a number of times to establish an initially trained DNN;

(b) identifying each interconnection associated with each layer of the initially trained DNN whose interconnection weight value does not exceed a first weight threshold;

(c) setting the value of each of identified interconnection to zero;

(d) inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce a current refined DNN, such that after the inputting of each data entry, the values of the weights associated with the interconnections of each hidden layer are set via an error back-propagation procedure so that the output from the output layer matches the label assigned to the training data entry;

(e) identifying those interconnections associated with each hidden layer of the last produced refined DNN whose interconnection weight value does not exceed a second weight threshold;

(f) setting the value of each of the identified interconnections whose interconnection weight value does not exceed the second weight threshold to zero; and

(g) repeating actions (d) through (f) a number of times to produce said trained DNN.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Deep Neural Network (DNN) training technique embodiments are presented that train a DNN while exploiting the sparseness of non-zero hidden layer interconnection weight values. Generally, a fully connected DNN is initially trained by sweeping through a full training set a number of times. Then, for the most part, only the interconnections whose weight magnitudes exceed a minimum weight threshold are considered in further training. This minimum weight threshold can be established as a value that results in only a prescribed maximum number of interconnections being considered when setting interconnection weight values via an error back-propagation procedure during the training. It is noted that the continued DNN training tends to converge much faster than the initial training.

40 Citations

View as Search Results

20 Claims

1. A computer-implemented process for training a deep neural network (DNN), comprising:
- using a computer to perform the following process actions;
  
  (a) initially training a fully interconnected DNN comprising an input layer into which training data is input, an output layer from which an output is generated, and a plurality of hidden layers, wherein said training comprises,(i) accessing a set of training data entries,(ii) inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce an interimly trained DNN, such that after the inputting of each data entry, a value of each weight associated with each interconnection of each hidden layer are set via an error back-propagation procedure so that the output from the output layer matches a label assigned to the training data entry,(iii) repeating actions (i) and (ii) a number of times to establish an initially trained DNN;
  
  (b) identifying each interconnection associated with each layer of the initially trained DNN whose interconnection weight value does not exceed a first weight threshold;
  
  (c) setting the value of each of identified interconnection to zero;
  
  (d) inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce a current refined DNN, such that after the inputting of each data entry, the values of the weights associated with the interconnections of each hidden layer are set via an error back-propagation procedure so that the output from the output layer matches the label assigned to the training data entry;
  
  (e) identifying those interconnections associated with each hidden layer of the last produced refined DNN whose interconnection weight value does not exceed a second weight threshold;
  
  (f) setting the value of each of the identified interconnections whose interconnection weight value does not exceed the second weight threshold to zero; and
  
  (g) repeating actions (d) through (f) a number of times to produce said trained DNN.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The process of claim 1, further comprising the process actions of:
    - after setting the value of each of identified interconnection whose interconnection weight value does not exceed the first weight threshold to zero, identifying the interconnection weight value having the smallest non-zero value; and
      
      establishing the second weight threshold to be the lesser of a prescribed minimum weight value or a prescribed percentage of the identified smallest non-zero interconnection weight value.
  - 3. The process of claim 2, wherein the prescribed minimum weight value ranges between 0.01 and 0.8.
  - 4. The process of claim 2, wherein the prescribed percentage of the identified smallest non-zero interconnection weight value ranges between 20% and 80%.
  - 5. The process of claim 1, wherein the first weight threshold is established as a value that results in only a prescribed maximum number of non-zero-weighted interconnections once each of identified interconnection whose interconnection weight value does not exceed the first weight threshold are set to zero.
  - 6. The process of claim 5, wherein the prescribed maximum number of non-zero-weighted interconnections ranges between 10% and 40%.
  - 7. The process of claim 1, wherein the number of times actions (a)(i) and (a)(ii) are repeated to establish the initially trained DNN ranges between 5 and 50.
  - 8. The process of claim 1, wherein the number of times actions (d) through (f) are repeated to establish the trained DNN ranges between 5 and 50.
  - 9. The process of claim 1, wherein the number of times actions (d) through (f) are repeated to establish the initially trained DNN corresponds to the number of times it takes for the interconnection weights associated with each hidden layer to not vary between iterations by more than a prescribed training threshold.
  - 10. The process of claim 1, wherein the process action of accessing the set of training data entries, comprises accessing a set of training data entries each data entry of which has a corresponding label assigned thereto, and wherein the process actions of inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce the interimly trained DNN or a refined DNN, comprises, after the inputting of each data entry, setting the values of said weights associated with the interconnections of each hidden layer via the error back-propagation procedure so that the output from the output layer matches a label assigned to the training data entry.
  - 11. The process of claim 10, wherein the process action of accessing the set of training data entries each data entry of which has a corresponding label assigned thereto, comprises accessing a set of speech frames each of which corresponds to a senone label.

12. A computer-implemented process for training a deep neural network (DNN), comprising:
- using a computer to perform the following process actions;
  
  (a) initially training a fully interconnected DNN comprising an input layer into which training data is input, an output layer from which an output is generated, and a plurality of hidden layers, wherein said training comprises,(i) accessing a set of training data entries,(ii) inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce an interimly trained DNN, such that after the inputting of each data entry, a value of each weight associated with each interconnection of each hidden layer are set via an error back-propagation procedure so that the output from the output layer matches a label assigned to the training data entry,(iii) repeating actions (i) and (ii) a number of times to establish an initially trained DNN;
  
  (b) identifying those interconnections associated with each layer of the initially trained DNN whose current weight value exceeds a minimum weight threshold;
  
  (c) inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce a refined DNN, such that after the inputting of each data entry, the value of each weight associated with each of the identified interconnections of each hidden layer is set via an error back-propagation procedure so that the output from the output layer matches the label assigned to the training data entry; and
  
  (d) repeating action (c) a number of times to produce said trained DNN.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
- - 13. The process of claim 12, wherein the minimum weight threshold is established as a value that results in only a prescribed maximum number of interconnections being considered when setting interconnection weight values via the error back-propagation procedure.
  - 14. The process of claim 13, wherein the prescribed maximum number of interconnections ranges between 10% and 40% of all interconnections.
  - 15. The process of claim 12, wherein the number of times actions (a)(i) and (a)(ii) are repeated to establish the initially trained DNN ranges between 5 and 50.
  - 16. The process of claim 12, wherein the number of times action (c) is repeated to establish the trained DNN ranges between 5 and 50.
  - 17. The process of claim 12, wherein the number of times action (c) is repeated to establish the initially trained DNN corresponds to the number of times it takes for the interconnection weights associated with each hidden layer to not vary between iterations by more than a prescribed training threshold.
  - 18. The process of claim 12, wherein the process action of accessing the set of training data entries, comprises accessing a set of training data entries each data entry of which has a corresponding label assigned thereto, and wherein the process actions of inputting each data entry of said set one by one into the input layer until all the data entries have been input once to produce the interimly trained DNN or a refined DNN, comprises, after the inputting of each data entry, setting the values of said weights associated with the interconnections of each hidden layer via the error back-propagation procedure so that the output from the output layer matches a label assigned to the training data entry.
  - 19. The process of claim 18, wherein the process action of accessing the set of training data entries each data entry of which has a corresponding label assigned thereto, comprises accessing a set of speech frames each of which corresponds to a senone label.

20. A computer storage medium for storing data for access by a deep neural network (DNN) training application program being executed on a computer, comprising:
- a data structure stored in said storage medium, said data structure comprising information used by said DNN training application program, said information representing a weight matrix having a plurality of columns and rows of weight values associated with interconnections between a pair of layers of the DNN, said data structure comprising;
  
  a header data structure element comprising,a total columns number representing the number of columns of said weight matrix, followed by,a series of column index numbers each of which identifies a location in the data structure where information corresponding to a different one of the plurality of weight matrix columns begins; and
  
  a plurality of column data structure elements each of which comprises information corresponding to a different one of the plurality of weight matrix columns, each of said column data structure elements comprising,a total non-zero weight value number representing the number of non-zero weight values in the column data structure element, followed by,a series of row identification numbers each of which identifies a row of the column of the weight matrix corresponding to the column data structure element that is associated with a non-zero weight value, followed by,a series of non-zero weight values each of which is assigned to a different one of the rows of the column of the weight matrix corresponding to the column data structure element that is associated with a non-zero weight value; and
  
  whereinsaid computer storage media consists of at least one of DVD'"'"'s, or CD'"'"'s, or floppy disks, or tape drives, or hard drives, or optical drives, or solid state memory devices, or RAM, or ROM, or EEPROM, or flash memory, or magnetic cassettes, or magnetic tapes, or magnetic disk storage, or other magnetic storage devices.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Yu, Dong, Deng, Li, Seide, Frank Torsten Bernd, Li, Gang
Primary Examiner(s)
Chaki, Kakali
Assistant Examiner(s)
Seck, Ababacar

Application Number

US13/305,741
Publication Number

US 20130138589A1
Time in Patent Office

869 Days
Field of Search

None
US Class Current

706/25
CPC Class Codes

G06N 3/08 Learning methods

Exploiting sparseness in training deep neural networks

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

40 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Exploiting sparseness in training deep neural networks

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

40 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links