Restructuring deep neural network acoustic models

US 9,728,184 B2
Filed: 06/18/2013
Issued: 08/08/2017
Est. Priority Date: 06/18/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

accessing a Deep Neural Network (DNN) model that includes a weight matrix and layers comprising;

an input layer;

a first hidden layer;

a second hidden layer, wherein the first and second hidden layers are coupled by the weight matrix comprising a plurality of values; and

an output layer;

determining whether the weight matrix is a weight matrix having at least as many parameters as a weight matrix immediately preceding the output layer;

upon determining that the weight matrix has at least as many parameters as a weight matrix immediately preceding the output layer, reducing a sparseness of the weight matrix in the DNN model, wherein reducing the sparseness comprises executing decomposition processing of the weight matrix to generate two smaller matrices from the weight matrix, wherein the decomposition processing comprises applying Singular Value Decomposition (SVD) to the weight matrix;

restructuring the DNN model based on the executed decomposition processing, wherein the restructuring further comprises modifying the plurality of values coupling the first and second hidden layers of the DNN model by replacing the weight matrix with the two smaller matrices;

providing the restructured DNN model;

receiving an utterance; and

processing the received utterance using the restructured DNN model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A Deep Neural Network (DNN) model used in an Automatic Speech Recognition (ASR) system is restructured. A restructured DNN model may include fewer parameters compared to the original DNN model. The restructured DNN model may include a monophone state output layer in addition to the senone output layer of the original DNN model. Singular value decomposition (SVD) can be applied to one or more weight matrices of the DNN model to reduce the size of the DNN Model. The output layer of the DNN model may be restructured to include monophone states in addition to the senones (tied triphone states) which are included in the original DNN model. When the monophone states are included in the restructured DNN model, the posteriors of monophone states are used to select a small part of senones to be evaluated.

Citations

20 Claims

1. A method comprising:
- accessing a Deep Neural Network (DNN) model that includes a weight matrix and layers comprising;
  
  an input layer;
  
  a first hidden layer;
  
  a second hidden layer, wherein the first and second hidden layers are coupled by the weight matrix comprising a plurality of values; and
  
  an output layer;
  
  determining whether the weight matrix is a weight matrix having at least as many parameters as a weight matrix immediately preceding the output layer;
  
  upon determining that the weight matrix has at least as many parameters as a weight matrix immediately preceding the output layer, reducing a sparseness of the weight matrix in the DNN model, wherein reducing the sparseness comprises executing decomposition processing of the weight matrix to generate two smaller matrices from the weight matrix, wherein the decomposition processing comprises applying Singular Value Decomposition (SVD) to the weight matrix;
  
  restructuring the DNN model based on the executed decomposition processing, wherein the restructuring further comprises modifying the plurality of values coupling the first and second hidden layers of the DNN model by replacing the weight matrix with the two smaller matrices;
  
  providing the restructured DNN model;
  
  receiving an utterance; and
  
  processing the received utterance using the restructured DNN model.
- View Dependent Claims (2, 3, 4, 5, 6, 17)
- - 2. The method of claim 1, wherein restructuring the DNN model with the weight matrix reduced in sparseness comprises splitting a layer in the DNN model into at least two smaller layers.
  - 3. The method of claim 1, wherein the restructuring further comprises replacing, in at least one layer of the DNN model, the weight matrix with the at least two smaller matrices.
  - 4. The method of claim 1, wherein the output layer comprises a senone output layer and a monophone state output layer.
  - 5. The method of claim 1, further comprising training the output layer of the DNN to use a monophone state.
  - 6. The method of claim 1, further comprising tuning the restructured model using a back-propagation method.
  - 17. The method of claim 1, wherein the weight matrix is automatically reduced if it is a weight matrix immediately preceding the output layer.

7. A computer storage device storing computer-executable instructions that, when executed by at least one processor, perform a method comprising:
- creating a restructured Deep Neural Network (DNN) model from an original DNN model, wherein the creating further comprises;
  
  accessing the original DNN model, the original DNN model including a weight matrix and layers comprising;
  
  an input layer;
  
  a first hidden layer;
  
  a second hidden layer, wherein the first and second hidden layers are coupled by the weight matrix comprising a plurality of values; and
  
  an output layerdetermining whether the weight matrix is a weight matrix having at least as many parameters as a weight matrix immediately preceding the output layer;
  
  upon determining that the weight matrix has at least as many parameters as a weight matrix immediately preceding the output layer, executing decomposition processing of the weight matrix of the original DNN model to generate two smaller matrices from the weight matrix, wherein the decomposition processing comprises applying Singular Value Decomposition (SVD) to the weight matrix; and
  
  restructuring the original DNN model based on the executed decomposition processing, wherein the restructuring further comprises modifying the plurality of values coupling the first and second hidden layers of the DNN model by replacing the weight matrix with the two smaller matrices;
  
  receiving an utterance; and
  
  using the restructured DNN model to recognize the received utterance.
- View Dependent Claims (8, 9, 10, 11, 12, 19)
- - 8. The computer storage device of claim 7, wherein a sparseness of the weight matrix in the original DNN is reduced in the restructured DNN model.
  - 9. The computer storage device of claim 7, wherein the output layer of the restructured DNN comprises a monophone state output layer and a senone output layer.
  - 10. The computer storage device of claim 9, further comprising using posteriors of monophone states to select senones to be evaluated to reduce the number of calculations in the senone output layer.
  - 11. The computer storage device of claim 7, wherein the restructured DNN model comprises at least one additional layer as compared with the original DNN model.
  - 12. The computer storage device of claim 7, further comprising tuning the restructured DNN model by executing a back-propagation method before using the restructured DNN model.
  - 19. The computer storage device of claim 7, wherein the instructions are further executable by the at least one processor for automatically reducing the weight matrix if it is a weight matrix immediately preceding the output layer.

13. A system comprising:
- a processor and memory;
  
  an operating environment executing using the processor; and
  
  a model manager that is configured to perform actions comprising;
  
  accessing a Deep Neural Network (DNN) model that includes a weight matrix and layers comprising;
  
  an input layer;
  
  a first hidden layer;
  
  a second hidden layer, wherein the first and second hidden layers are coupled by the weight matrix comprising plurality of values; and
  
  an output layer;
  
  determining whether the weight matrix is a weight matrix having at least as many parameters as a weight matrix immediately preceding the output layer;
  
  upon determining that the weight matrix has at least as many parameters as a weight matrix immediately preceding the output layer, reducing a sparseness of the weight matrix in the DNN model, wherein reducing the sparseness comprises executing decomposition processing of the weight matrix to generate two smaller matrices from the weight matrix, wherein the decomposition processing comprises applying Singular Value Decomposition (SVD) to the weight matrix;
  
  restructuring the DNN model based on the executed decomposition processing, wherein the restructuring further comprises modifying the plurality of values coupling the first and second hidden layers of the DNN model by replacing the weight matrix with the two smaller matrices;
  
  providing the restructured DNN model;
  
  receiving an utterance; and
  
  processing the received utterance using the restructured DNN model.
- View Dependent Claims (14, 15, 16, 18, 20)
- - 14. The system of claim 13, wherein restructuring the DNN model with the weight matrix reduced in sparseness comprises splitting one of the layers in the DNN model into at least two smaller layers.
  - 15. The system of claim 13, wherein the output layer comprises a senone output layer and a monophone state output layer.
  - 16. The system of claim 13, further comprising training the output layer of the DNN to use a monophone state.
  - 18. The system of claim 13, wherein the weight matrix is automatically reduced if it is a weight matrix immediately preceding the output layer.
  - 20. The system of claim 13, wherein the model manager is further configured to tune the restructured DNN model by executing a back-propagation method before using the restructured DNN model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Gong, Yifan, Li, Jinyu, Xue, Jian, Stoimenov, Emilian
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Tzeng, Forrest F

Application Number

US13/920,323
Publication Number

US 20140372112A1
Time in Patent Office

1,512 Days
Field of Search

704202, 704232, 704246, 704251, 704256, 704E15036
US Class Current
CPC Class Codes

G06N 3/045   Combinations of networks

G06N 3/084   Backpropagation, e.g. using...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/16   using artificial neural net...

Restructuring deep neural network acoustic models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Restructuring deep neural network acoustic models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links