LOW-FOOTPRINT ADAPTATION AND PERSONALIZATION FOR A DEEP NEURAL NETWORK
First Claim
1. A method of adapting and personalizing a deep neural network (DNN) model for automatic speech recognition (ASR), comprising:
- receiving, by a computing device, at least one utterance comprising a plurality of speech features for one or more speakers from one or more ASR tasks;
applying, by the computing device, a decomposition approach to an original matrix in the DNN model;
in response to applying the decomposition approach, converting the original matrix into a plurality of new matrices, each of the plurality of new matrices being smaller than the original matrix;
adding, by the computing device, another matrix to the plurality of new matrices; and
adapting, by the computing device, the DNN model by updating the added matrix, the adapted DNN model comprising a reduction in a number of parameters in the DNN model.
3 Assignments
0 Petitions
Accused Products
Abstract
The adaptation and personalization of a deep neural network (DNN) model for automatic speech recognition is provided. An utterance which includes speech features for one or more speakers may be received in ASR tasks such as voice search or short message dictation. A decomposition approach may then be applied to an original matrix in the DNN model. In response to applying the decomposition approach, the original matrix may be converted into multiple new matrices which are smaller than the original matrix. A square matrix may then be added to the new matrices. Speaker-specific parameters may then be stored in the square matrix. The DNN model may then be adapted by updating the square matrix. This process may be applied to all of a number of original matrices in the DNN model. The adapted DNN model may include a reduced number of parameters than those received in the original DNN model.
27 Citations
20 Claims
-
1. A method of adapting and personalizing a deep neural network (DNN) model for automatic speech recognition (ASR), comprising:
-
receiving, by a computing device, at least one utterance comprising a plurality of speech features for one or more speakers from one or more ASR tasks; applying, by the computing device, a decomposition approach to an original matrix in the DNN model; in response to applying the decomposition approach, converting the original matrix into a plurality of new matrices, each of the plurality of new matrices being smaller than the original matrix; adding, by the computing device, another matrix to the plurality of new matrices; and adapting, by the computing device, the DNN model by updating the added matrix, the adapted DNN model comprising a reduction in a number of parameters in the DNN model. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for adapting and personalizing a deep neural network (DNN) model for automatic speech recognition (ASR), comprising:
-
a memory for storing executable program code; and a processor, functionally coupled to the memory, the processor being responsive to computer-executable instructions contained in the program code and operative to; receive at least one utterance comprising a plurality of speech features for one or more speakers from one or more ASR tasks; determine an adapted DNN model from the DNN model, the DNN model comprising a plurality of unadapted matrices and the adapted DNN model comprising a plurality of adapted matrices; calculate a difference between the plurality adapted matrices and the plurality of unadapted matrices to determine a plurality of delta matrices; apply a decomposition approach to each of the plurality of delta matrices; convert each of the plurality of delta matrices into a subset of small matrices; and store the subset of small matrices, the subset of small matrices comprising a small percentage of a plurality of parameters in the DNN model. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer-readable storage medium storing computer executable instructions which, when executed by a computer, will cause computer to perform a method of adapting and personalizing a deep neural network (DNN) model for automatic speech recognition (ASR), the method comprising:
-
receiving a plurality of utterances, each of the plurality of utterances comprising a plurality of speech features for a plurality of speakers from one or more ASR tasks; applying a decomposition approach to an original matrix in the DNN model; in response to applying the decomposition approach, converting the original matrix into a plurality of new matrices, each of the plurality of new matrices being smaller than the original matrix; adding a square matrix to the plurality of new matrices; and adapting the DNN model by only updating the square matrix, the adapted DNN model comprising a reduction in a number of the plurality of parameters in the DNN model. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification