COMPRESSED RECURRENT NEURAL NETWORK MODELS

US 20200134470A1
Filed: 12/23/2019
Published: 04/30/2020
Est. Priority Date: 03/01/2016
Status: Active Grant

First Claim

Patent Images

1-19. -19. (canceled)

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing long-short term memory layers with compressed gating functions. One of the systems includes a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the plurality of gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix.

Citations

38 Claims

1-19. -19. (canceled)

20. A method of generating an output sequence comprising a neural network output at each of a plurality of time steps from an input sequence comprising a respective neural network input at each of the plurality of time steps, the method comprising:
- processing the input sequence using a recurrent neural network implemented by one or more computers, wherein the recurrent neural network is configured to receive the respective neural network input at each of the plurality of time steps and to generate a respective neural network output at each of the plurality of time steps, and wherein the recurrent neural network comprises;
  
  a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix, andwherein the gate parameter matrix for at least one of the plurality of gates is a Toeplitz-like structured matrix.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28)
- - 21. The method of claim 20, wherein the recurrent neural network comprises a second LSTM layer, wherein the second LSTM layer is configured to, for each of the plurality of time steps, generate a second new layer state and a second new layer output by applying a second plurality of gates to a second current layer input, a second current layer state, and a second current layer output, each of the second plurality of gates being configured to, for each of the plurality of time steps, generate a respective second intermediate gate output vector by multiplying a second gate input vector and a second gate parameter matrix, andwherein the gate parameter matrix for at least one of the second plurality of gates is defined by a compressed parameter matrix and a projection matrix.
  - 22. The method of claim 21, wherein the first LSTM layer and the second LSTM layer are each one of a plurality of LSTM layers in an ordered stack of layers.
  - 23. The method of claim 22, wherein the first LSTM layer is lower in the stack than the second LSTM layer.
  - 24. The method of claim 20, wherein each of the plurality of gates are configured to, for each of the plurality of time steps, apply a respective gating function to each component of the respective intermediate gate output vector to generate a respective final gate output vector.
  - 25. The method of claim 20, wherein the neural network is an acoustic model.
  - 26. The method of claim 20, wherein the neural network is a speech recognition model.
  - 27. The method of claim 20, wherein the neural network is compressed by at least 75% of an uncompressed version of the neural network.
  - 28. The method of claim 20, wherein a word error rate of the neural network is within 0.3% of a word error rate of an uncompressed version of the neural network.

29. A method of generating an output sequence comprising a neural network output at each of a plurality of time steps from an input sequence comprising a respective neural network input at each of the plurality of time steps, the method comprising:
- processing the input sequence using a recurrent neural network implemented by one or more computers, wherein the recurrent neural network is configured to receive a respective neural network input at each of a plurality of time steps and to generate a respective neural network output at each of the plurality of time steps, and wherein the recurrent neural network comprises;
  
  a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix, andwherein the gate parameter matrix for at least one of the plurality of gates is defined by a compressed parameter matrix and a projection matrix.
- View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37)
- - 30. The method of claim 29, wherein the recurrent neural network comprises a second LSTM layer, wherein the second LSTM layer is configured to, for each of the plurality of time steps, generate a second new layer state and a second new layer output by applying a second plurality of gates to a second current layer input, a second current layer state, and a second current layer output, each of the second plurality of gates being configured to, for each of the plurality of time steps, generate a respective second intermediate gate output vector by multiplying a second gate input vector and a second gate parameter matrix, andwherein the gate parameter matrix for at least one of the second plurality of gates is a Toeplitz-like structured matrix.
  - 31. The method of claim 30, wherein the first LSTM layer and the second LSTM layer are each one of a plurality of LSTM layers in an ordered stack of layers.
  - 32. The method of claim 31, wherein the second LSTM layer is lower in the stack than the first LSTM layer.
  - 33. The method of claim 29, wherein each of the plurality of gates are configured to, for each of the plurality of time steps, apply a respective gating function to each component of the respective intermediate gate output vector to generate a respective final gate output vector.
  - 34. The method of claim 29, wherein the neural network is an acoustic model.
  - 35. The method of claim 29, wherein the neural network is a speech recognition model.
  - 36. The method of claim 29, wherein the neural network is compressed by at least 75% of an uncompressed version of the neural network.
  - 37. The method of claim 29, wherein a word error rate of the neural network is within 0.3% of a word error rate of an uncompressed version of the neural network.

38. One or more non-transitory computer storage media encoded with a computer program product, the computer program product comprising instructions that when executed by one or more computers cause the one or more computers to perform operations for generating an output sequence comprising a neural network output at each of a plurality of time steps from an input sequence comprising a respective neural network input at each of the plurality of time steps, the operations comprising:
- processing the input sequence using a recurrent neural network implemented by one or more computers, wherein the recurrent neural network is configured to receive the respective neural network input at each of the plurality of time steps and to generate a respective neural network output at each of the plurality of time steps, and wherein the recurrent neural network comprises;
  
  a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix, andwherein the gate parameter matrix for at least one of the plurality of gates is a Toeplitz-like structured matrix.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Sainath, Tara N., Sindhwani, Vikas

Granted Patent

US 11,741,366 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06N 3/044 Recurrent networks, e.g. Ho...

G06N 3/084 Backpropagation, e.g. using...

COMPRESSED RECURRENT NEURAL NETWORK MODELS

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

38 Claims

Specification

Solutions

Use Cases

Quick Links

COMPRESSED RECURRENT NEURAL NETWORK MODELS

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

38 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links