LOW-FOOTPRINT ADAPTATION AND PERSONALIZATION FOR A DEEP NEURAL NETWORK

US 20150255061A1
Filed: 03/07/2014
Published: 09/10/2015
Est. Priority Date: 03/07/2014
Status: Active Grant

First Claim

Patent Images

1. A method of adapting and personalizing a deep neural network (DNN) model for automatic speech recognition (ASR), comprising:

receiving, by a computing device, at least one utterance comprising a plurality of speech features for one or more speakers from one or more ASR tasks;

applying, by the computing device, a decomposition approach to an original matrix in the DNN model;

in response to applying the decomposition approach, converting the original matrix into a plurality of new matrices, each of the plurality of new matrices being smaller than the original matrix;

adding, by the computing device, another matrix to the plurality of new matrices; and

adapting, by the computing device, the DNN model by updating the added matrix, the adapted DNN model comprising a reduction in a number of parameters in the DNN model.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The adaptation and personalization of a deep neural network (DNN) model for automatic speech recognition is provided. An utterance which includes speech features for one or more speakers may be received in ASR tasks such as voice search or short message dictation. A decomposition approach may then be applied to an original matrix in the DNN model. In response to applying the decomposition approach, the original matrix may be converted into multiple new matrices which are smaller than the original matrix. A square matrix may then be added to the new matrices. Speaker-specific parameters may then be stored in the square matrix. The DNN model may then be adapted by updating the square matrix. This process may be applied to all of a number of original matrices in the DNN model. The adapted DNN model may include a reduced number of parameters than those received in the original DNN model.

27 Citations

View as Search Results

20 Claims

1. A method of adapting and personalizing a deep neural network (DNN) model for automatic speech recognition (ASR), comprising:
- receiving, by a computing device, at least one utterance comprising a plurality of speech features for one or more speakers from one or more ASR tasks;
  
  applying, by the computing device, a decomposition approach to an original matrix in the DNN model;
  
  in response to applying the decomposition approach, converting the original matrix into a plurality of new matrices, each of the plurality of new matrices being smaller than the original matrix;
  
  adding, by the computing device, another matrix to the plurality of new matrices; and
  
  adapting, by the computing device, the DNN model by updating the added matrix, the adapted DNN model comprising a reduction in a number of parameters in the DNN model.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, further comprising replacing an original layer in the DNN model with a plurality of new layers.
  - 3. The method of claim 2, wherein at least one of the plurality of new layers comprises a non-linear layer.
  - 4. The method of claim 1, wherein applying, by the computing device, a decomposition approach to an original matrix in the DNN model comprises applying singular value decomposition (SVD) to the original matrix in the DNN model.
  - 5. The method of claim 1, wherein adding, by the computing device, another matrix to the plurality of new matrices comprises adding a small square matrix.
  - 6. The method of claim 5, wherein adapting, by the computing device, the DNN model by updating the added matrix, the adapted DNN model comprising a reduction in a number of parameters in the DNN model, comprises only updating the small square matrix for each of the one or more speakers.

7. A system for adapting and personalizing a deep neural network (DNN) model for automatic speech recognition (ASR), comprising:
- a memory for storing executable program code; and
  
  a processor, functionally coupled to the memory, the processor being responsive to computer-executable instructions contained in the program code and operative to;
  
  receive at least one utterance comprising a plurality of speech features for one or more speakers from one or more ASR tasks;
  
  determine an adapted DNN model from the DNN model, the DNN model comprising a plurality of unadapted matrices and the adapted DNN model comprising a plurality of adapted matrices;
  
  calculate a difference between the plurality adapted matrices and the plurality of unadapted matrices to determine a plurality of delta matrices;
  
  apply a decomposition approach to each of the plurality of delta matrices;
  
  convert each of the plurality of delta matrices into a subset of small matrices; and
  
  store the subset of small matrices, the subset of small matrices comprising a small percentage of a plurality of parameters in the DNN model.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein the processor, in applying a decomposition approach to each of the plurality of delta matrices, is operative to apply singular value decomposition (SVD) to each of the plurality of delta matrices.
  - 9. The system of claim 7, wherein the processor, in converting each of the plurality of delta matrices into a subset of small matrices, is operative to convert the product of two low-rank matrices.
  - 10. The system of claim 7, wherein the processor, in storing the subset of small matrices, the subset of small matrices comprising a small percentage of a plurality of parameters in the DNN model, is operative to only store the subset of small matrices for each of the one or more speakers.
  - 11. The system of claim 7, wherein the at least one utterance comprises a short message dictation.
  - 12. The system of claim 7, wherein the at least one utterance comprises a voice search query.

13. A computer-readable storage medium storing computer executable instructions which, when executed by a computer, will cause computer to perform a method of adapting and personalizing a deep neural network (DNN) model for automatic speech recognition (ASR), the method comprising:
- receiving a plurality of utterances, each of the plurality of utterances comprising a plurality of speech features for a plurality of speakers from one or more ASR tasks;
  
  applying a decomposition approach to an original matrix in the DNN model;
  
  in response to applying the decomposition approach, converting the original matrix into a plurality of new matrices, each of the plurality of new matrices being smaller than the original matrix;
  
  adding a square matrix to the plurality of new matrices; and
  
  adapting the DNN model by only updating the square matrix, the adapted DNN model comprising a reduction in a number of the plurality of parameters in the DNN model.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The computer-readable storage medium of claim 13, further comprising replacing an original layer in the DNN model with a plurality of new layers.
  - 15. The computer-readable storage medium of claim 14, wherein at least one of the plurality of new layers comprises a non-linear layer, the non-linear layer comprising a non-linear function.
  - 16. The computer-readable storage medium of claim 14, wherein at least one of the plurality of new layers comprises a linear layer, the linear layer comprising a linear function.
  - 17. The computer-readable storage medium of claim 13, wherein applying a decomposition approach to an original matrix in the DNN model comprises applying singular value decomposition (SVD) to the original matrix in the DNN model.
  - 18. The computer-readable storage medium of claim 13, wherein adapting the DNN model by only updating the square matrix, the adapted DNN model comprising a reduction in a number of the plurality of parameters in the DNN model, comprises only updating the square matrix for each of the plurality of speakers.
  - 19. The computer-readable storage medium of claim 13, wherein at least one of the plurality of utterances comprises a short message dictation.
  - 20. The computer-readable storage medium of claim 13, wherein at least one of the plurality of utterances comprises a voice search query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Li, Jinyu, Yu, Dong, Seltzer, Michael L., Gong, Yifan, Xue, Jian

Granted Patent

US 9,324,321 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06N 3/082   modifying the architecture,...

G10L 15/075   supervised, i.e. under mach...

G10L 15/16   using artificial neural net...

LOW-FOOTPRINT ADAPTATION AND PERSONALIZATION FOR A DEEP NEURAL NETWORK

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

27 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

LOW-FOOTPRINT ADAPTATION AND PERSONALIZATION FOR A DEEP NEURAL NETWORK

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

27 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links