Compressed recurrent neural network models

US 10,878,319 B2
Filed: 12/29/2016
Issued: 12/29/2020
Est. Priority Date: 02/03/2016
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

data processing hardware; and

memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising;

training an uncompressed version of a recurrent neural network (RNN) on training data to learn a respective recurrent weight matrix, W_h, and a respective inter-layer weight matrix, W_x, for each of a plurality of uncompressed recurrent layers of the uncompressed version of the RNN, each recurrent layer of the plurality of uncompressed recurrent layers configured to, for each of a plurality of time steps;

receive a respective layer input for the time step; and

process the respective layer input for the time step to generate a respective layer output for the time step;

re-configuring the trained RNN by, for at least one recurrent layer of the plurality of uncompressed recurrent layers of the uncompressed version of the trained RNN, compressing the recurrent layer by;

determining a respective singular value decomposition (SVD) of the respective recurrent weight matrix, W_h, for the recurrent layer;

generating a first compressed weight matrix, Z_h^l, and a projection matrix, P^l, based on the respective SVD of the respective recurrent weight matrix, W_h, for the recurrent layer;

generating a second compressed weight matrix, Z_x^l, based on the first compressed weight matrix, Z_h^l, and the projection matrix, P^l;

replacing the respective recurrent weight matrix, W_h, with the product of the first compressed weight matrix, Z_h^l, and the projection matrix, P^l; and

replacing the respective inter-layer weight matrix, W_x, with the product of the second compressed weight matrix, Z_x^l, and the projection matrix, P^l; and

transmitting the re-configured trained RNN having the at least one compressed recurrent layer to a mobile device in communication with the data processing hardware, the re-configured trained RNN having the at least one compressed recurrent layer configured to receive a respective neural network input at each of multiple time steps and generate a respective neural network output at each of the multiple time steps.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a compressed recurrent neural network (RNN). One of the systems includes a compressed RNN, the compressed RNN comprising a plurality of recurrent layers, wherein each of the recurrent layers has a respective recurrent weight matrix and a respective inter-layer weight matrix, and wherein at least one of recurrent layers is compressed such that a respective recurrent weight matrix of the compressed layer is defined by a first compressed weight matrix and a projection matrix and a respective inter-layer weight matrix of the compressed layer is defined by a second compressed weight matrix and the projection matrix.

17 Citations

15 Claims

1. A system comprising:
- data processing hardware; and
  
  memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising;
  
  training an uncompressed version of a recurrent neural network (RNN) on training data to learn a respective recurrent weight matrix, W_h, and a respective inter-layer weight matrix, W_x, for each of a plurality of uncompressed recurrent layers of the uncompressed version of the RNN, each recurrent layer of the plurality of uncompressed recurrent layers configured to, for each of a plurality of time steps;
  
  receive a respective layer input for the time step; and
  
  process the respective layer input for the time step to generate a respective layer output for the time step;
  
  re-configuring the trained RNN by, for at least one recurrent layer of the plurality of uncompressed recurrent layers of the uncompressed version of the trained RNN, compressing the recurrent layer by;
  
  determining a respective singular value decomposition (SVD) of the respective recurrent weight matrix, W_h, for the recurrent layer;
  
  generating a first compressed weight matrix, Z_h^l, and a projection matrix, P^l, based on the respective SVD of the respective recurrent weight matrix, W_h, for the recurrent layer;
  
  generating a second compressed weight matrix, Z_x^l, based on the first compressed weight matrix, Z_h^l, and the projection matrix, P^l;
  
  replacing the respective recurrent weight matrix, W_h, with the product of the first compressed weight matrix, Z_h^l, and the projection matrix, P^l; and
  
  replacing the respective inter-layer weight matrix, W_x, with the product of the second compressed weight matrix, Z_x^l, and the projection matrix, P^l; and
  
  transmitting the re-configured trained RNN having the at least one compressed recurrent layer to a mobile device in communication with the data processing hardware, the re-configured trained RNN having the at least one compressed recurrent layer configured to receive a respective neural network input at each of multiple time steps and generate a respective neural network output at each of the multiple time steps.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system of claim 1, wherein each recurrent layer of the plurality of uncompressed recurrent layers is configured to, for each time step, generate the respective layer output for the time step by applying an inter-layer weight matrix, W_x^l−
    - 1, for a previous layer to a current input to the layer and applying the respective recurrent weight matrix, W_h^l, for the layer to a recurrent input to the layer.
  - 3. The system of claim 1, further comprising, after compressing the at least one recurrent layer of the plurality of uncompressed recurrent layers, generating the respective layer output, for each time step, by applying the first compressed weight matrix, Z_h^l, and the projection matrix, P^l, to a respective recurrent input to the layer.
  - 4. The system of claim 1, wherein the RNN comprises an acoustic model.
  - 5. The system of claim 1, wherein the RNN comprises a speech recognition model.
  - 6. The system of claim 1, wherein, after compressing the at least one recurrent layer of the plurality of uncompressed recurrent layers, the re-configured trained RNN comprises at least a 68% compression of the uncompressed version of the RNN prior to compressing the at least one recurrent layer of the plurality of uncompressed recurrent layers.
  - 7. The system of claim 1, wherein, after compressing the at least one recurrent layer of the plurality of uncompressed recurrent layers, a word error rate of the re-configured trained RNN is within 5% of a word error rate of the uncompressed version of the RNN prior to compressing the at least one recurrent layer of the plurality of uncompressed recurrent layers.

8. A method for compressing a recurrent neural network (RNN), the method comprising:
- training, by data processing hardware, an uncompressed version of a recurrent neural network (RNN) on training data to learn a respective recurrent weight matrix, W_h, and a respective inter-layer weight matrix, W_x, for each of a plurality of uncompressed recurrent layers of the uncompressed version of the RNN, each recurrent layer of the plurality of uncompressed recurrent layers configured to, for each of a plurality of time steps;
  
  receive a respective layer input for the time step; and
  
  process the respective layer input for the time step to generate a respective layer output for the time step;
  
  re-configuring the trained RNN by, for at least one recurrent layer of the plurality of uncompressed recurrent layers of the uncompressed version of the trained RNN, compressing, by the data processing hardware, the recurrent layer by;
  
  determining a respective singular value decomposition (SVD) of the respective recurrent weight matrix, W_h, for the recurrent layer;
  
  generating a first compressed weight matrix, Z_h^l, and a projection matrix, P^l, based on the respective SVD of the respective recurrent weight matrix, W_h, for the recurrent layer;
  
  generating a second compressed weight matrix, Z_x^l, based on the first compressed weight matrix, Z_h^l, and the projection matrix, P^l;
  
  replacing the respective recurrent weight matrix, W_h, with the product of the first compressed weight matrix, Z_h^l, and the projection matrix, P^l; and
  
  replacing the respective inter-layer weight matrix, W_x, with the product of the second compressed weight matrix, Z_x^l, and the projection matrix, P^l; and
  
  transmitting, by the data processing hardware, the re-configured trained RNN having the at least one compressed recurrent layer to a mobile device in communication with the data processing hardware, the re-configured trained RNN having the at least one compressed recurrent layer configured to receive a respective neural network input at each of multiple time steps and generate a respective neural network output at each of the multiple time steps.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
- - 9. The method of claim 8, wherein the respective ranks of the first compressed weight matrix, Z_h^l, and the projection matrix, P^l, are less than the rank of the respective recurrent weight matrix, W_h, for the recurrent layer.
  - 10. The method of claim 8, wherein the respective ranks of the second compressed weight matrix, Z_x^l, and the projection matrix, P^l, are less than the rank of the respective inter-layer weight matrix, W_x, for the recurrent layer.
  - 11. The method of claim 8, wherein generating the second compressed weight matrix, Z_x^l, comprises inverting the projection matrix, P^l, and multiplying the inverted projection matrix by the respective inter-layer weight matrix, W_x, for the recurrent layer.
  - 12. The method of claim 8, wherein the at least one recurrent layer of the plurality of uncompressed recurrent layers that is compressed comprises an l-th layer, and wherein the output for the l-th layer can be expressed by h_t^l=σ
    - (W_x^l−
      
      1+Z_h^lP^lh_t−
      
      1^l+b^l), wherein h_t^lrepresents a hidden layer activation output of the l-th layer at time t, W_x^l−
      
      1represents an inter-layer weight matrix from a previous, (l−
      
      1)-th, layer b¹represents an l-th layer bias vector, and σ
      
      (⋅
      
      ) denotes a non-linear activation function.
  - 13. The method of claim 8, wherein the RNN comprises a long short-term memory (LSTM) RNN.
  - 14. The method of claim 8, wherein the RNN comprises an acoustic model.
  - 15. The method of claim 8, wherein the RNN comprises a speech recognition model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Alsharif, Ouais, Prabhavalkar, Rohit Prakash, McGraw, Ian C., Bruguier, Antoine Jean
Primary Examiner(s)
Huang, Miranda M
Assistant Examiner(s)
Hinckley, Chase P.

Application Number

US15/394,617
Publication Number

US 20170220925A1
Time in Patent Office

1,461 Days
Field of Search

706 25
US Class Current
CPC Class Codes

G05B 2219/33025   Recurrent artificial neural...

G05B 2219/40326   Singular value decomposition

G06F 17/16   Matrix or vector computatio...

G06N 20/00   Machine learning

G06N 3/04   Architecture, e.g. intercon...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/049   Temporal neural networks, e...

G06N 3/08   Learning methods

G06N 3/084   Backpropagation, e.g. using...

Compressed recurrent neural network models

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

17 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Compressed recurrent neural network models

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

17 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links