COMPRESSED RECURRENT NEURAL NETWORK MODELS
0 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing long-short term memory layers with compressed gating functions. One of the systems includes a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the plurality of gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix.
-
Citations
38 Claims
-
1-19. -19. (canceled)
-
20. A method of generating an output sequence comprising a neural network output at each of a plurality of time steps from an input sequence comprising a respective neural network input at each of the plurality of time steps, the method comprising:
-
processing the input sequence using a recurrent neural network implemented by one or more computers, wherein the recurrent neural network is configured to receive the respective neural network input at each of the plurality of time steps and to generate a respective neural network output at each of the plurality of time steps, and wherein the recurrent neural network comprises; a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix, and wherein the gate parameter matrix for at least one of the plurality of gates is a Toeplitz-like structured matrix. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A method of generating an output sequence comprising a neural network output at each of a plurality of time steps from an input sequence comprising a respective neural network input at each of the plurality of time steps, the method comprising:
-
processing the input sequence using a recurrent neural network implemented by one or more computers, wherein the recurrent neural network is configured to receive a respective neural network input at each of a plurality of time steps and to generate a respective neural network output at each of the plurality of time steps, and wherein the recurrent neural network comprises; a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix, and wherein the gate parameter matrix for at least one of the plurality of gates is defined by a compressed parameter matrix and a projection matrix. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37)
-
-
38. One or more non-transitory computer storage media encoded with a computer program product, the computer program product comprising instructions that when executed by one or more computers cause the one or more computers to perform operations for generating an output sequence comprising a neural network output at each of a plurality of time steps from an input sequence comprising a respective neural network input at each of the plurality of time steps, the operations comprising:
-
processing the input sequence using a recurrent neural network implemented by one or more computers, wherein the recurrent neural network is configured to receive the respective neural network input at each of the plurality of time steps and to generate a respective neural network output at each of the plurality of time steps, and wherein the recurrent neural network comprises; a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix, and wherein the gate parameter matrix for at least one of the plurality of gates is a Toeplitz-like structured matrix.
-
Specification