Low Latency Long Short-Term Memory Inference with Sequence Interleaving
First Claim
1. A system comprising:
- a processing unit;
a machine learning engine comprising a plurality of matrix multiplication units and one or more long short-term memory (LSTM) layers; and
a memory coupled to the processing unit and the machine learning engine;
wherein the processing unit is configured to;
detect, in the memory, a plurality of sequences that will be processed by the machine learning engine; and
interleave the plurality of sequences together into data blocks, wherein each data block comprises samples from the plurality of sequences;
wherein the machine learning engine is configured to;
receive a given data block;
perform, in parallel, a plurality of matrix multiplication operations on a plurality of sequences from the given data block and a plurality of coefficients; and
convey outputs from the plurality of matrix multiplication units to the one or more LSTM layers.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems, apparatuses, and methods for implementing a low latency long short-term memory (LSTM) machine learning engine using sequence interleaving techniques are disclosed. A computing system includes at least a host processing unit, a machine learning engine, and a memory. The host processing unit detects a plurality of sequences which will be processed by the machine learning engine. The host processing unit interleaves the sequences into data blocks and stores the data blocks in the memory. When the machine learning engine receives a given data block, the machine learning engine performs, in parallel, a plurality of matrix multiplication operations on the plurality of sequences in the given data block and a plurality of coefficients. Then, the outputs of the matrix multiplication operations are coupled to one or more LSTM layers.
7 Citations
20 Claims
-
1. A system comprising:
-
a processing unit; a machine learning engine comprising a plurality of matrix multiplication units and one or more long short-term memory (LSTM) layers; and a memory coupled to the processing unit and the machine learning engine; wherein the processing unit is configured to; detect, in the memory, a plurality of sequences that will be processed by the machine learning engine; and interleave the plurality of sequences together into data blocks, wherein each data block comprises samples from the plurality of sequences; wherein the machine learning engine is configured to; receive a given data block; perform, in parallel, a plurality of matrix multiplication operations on a plurality of sequences from the given data block and a plurality of coefficients; and convey outputs from the plurality of matrix multiplication units to the one or more LSTM layers. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
detecting, by a processing unit, a plurality of sequences that will be processed by a machine learning engine; interleaving, by the processing unit, the plurality of sequences together into data blocks, wherein each data block comprises samples from the plurality of sequences; receiving, by the machine learning engine, a given data block; performing, by the machine learning engine, a plurality of matrix multiplication operations in parallel on a plurality of sequences from the given data block and a plurality of coefficients; and conveying, by the machine learning engine, outputs from the plurality of matrix multiplication units to the one or more long short-term memory (LSTM) layers. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. An apparatus comprising:
-
a machine learning engine; and a memory coupled to machine learning engine; wherein the machine learning engine is configured to; receive a given data block with a plurality of sequences interleaved together; perform, in parallel, a plurality of matrix multiplication operations on the plurality of sequences from the given data block and a plurality of coefficients; and convey outputs from the plurality of matrix multiplication units to the one or more long short-term memory (LSTM) layers. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification