Low Latency Long Short-Term Memory Inference with Sequence Interleaving

US 20200134432A1
Filed: 10/31/2018
Published: 04/30/2020
Est. Priority Date: 10/31/2018
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a processing unit;

a machine learning engine comprising a plurality of matrix multiplication units and one or more long short-term memory (LSTM) layers; and

a memory coupled to the processing unit and the machine learning engine;

wherein the processing unit is configured to;

detect, in the memory, a plurality of sequences that will be processed by the machine learning engine; and

interleave the plurality of sequences together into data blocks, wherein each data block comprises samples from the plurality of sequences;

wherein the machine learning engine is configured to;

receive a given data block;

perform, in parallel, a plurality of matrix multiplication operations on a plurality of sequences from the given data block and a plurality of coefficients; and

convey outputs from the plurality of matrix multiplication units to the one or more LSTM layers.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, apparatuses, and methods for implementing a low latency long short-term memory (LSTM) machine learning engine using sequence interleaving techniques are disclosed. A computing system includes at least a host processing unit, a machine learning engine, and a memory. The host processing unit detects a plurality of sequences which will be processed by the machine learning engine. The host processing unit interleaves the sequences into data blocks and stores the data blocks in the memory. When the machine learning engine receives a given data block, the machine learning engine performs, in parallel, a plurality of matrix multiplication operations on the plurality of sequences in the given data block and a plurality of coefficients. Then, the outputs of the matrix multiplication operations are coupled to one or more LSTM layers.

7 Citations

20 Claims

1. A system comprising:
- a processing unit;
  
  a machine learning engine comprising a plurality of matrix multiplication units and one or more long short-term memory (LSTM) layers; and
  
  a memory coupled to the processing unit and the machine learning engine;
  
  wherein the processing unit is configured to;
  
  detect, in the memory, a plurality of sequences that will be processed by the machine learning engine; and
  
  interleave the plurality of sequences together into data blocks, wherein each data block comprises samples from the plurality of sequences;
  
  wherein the machine learning engine is configured to;
  
  receive a given data block;
  
  perform, in parallel, a plurality of matrix multiplication operations on a plurality of sequences from the given data block and a plurality of coefficients; and
  
  convey outputs from the plurality of matrix multiplication units to the one or more LSTM layers.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system as recited in claim 1, wherein each sequence comprises a plurality of samples.
  - 3. The system as recited in claim 1, wherein the plurality of coefficients are stored in an N×
    - (N+M) matrix, wherein N and M are positive integers greater than one.
  - 4. The system as recited in claim 3, wherein the given data block is stored in an N-sample array.
  - 5. The system as recited in claim 4, wherein N is scalable based on a local memory bus width, a number of multiplier-accumulator units, and an availability of LSTM cells.
  - 6. The system as recited in claim 1, wherein the plurality of matrix multiplication operations comprise a same set of N coefficients being multiplied by different sequences of the plurality of sequences, wherein N is a positive integer greater than one.
  - 7. The system as recited in claim 6, wherein the machine learning engine implements a recurrent neural network.

8. A method comprising:
- detecting, by a processing unit, a plurality of sequences that will be processed by a machine learning engine;
  
  interleaving, by the processing unit, the plurality of sequences together into data blocks, wherein each data block comprises samples from the plurality of sequences;
  
  receiving, by the machine learning engine, a given data block;
  
  performing, by the machine learning engine, a plurality of matrix multiplication operations in parallel on a plurality of sequences from the given data block and a plurality of coefficients; and
  
  conveying, by the machine learning engine, outputs from the plurality of matrix multiplication units to the one or more long short-term memory (LSTM) layers.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method as recited in claim 8, wherein each sequence comprises a plurality of samples.
  - 10. The method as recited in claim 8, wherein the plurality of coefficients are stored in an N×
    - (N+M) matrix, wherein N and M are positive integers greater than one.
  - 11. The method as recited in claim 10, wherein the given data block is stored in an N-sample array.
  - 12. The method as recited in claim 11, wherein N is scalable based on a local memory bus width, a number of multiplier-accumulator units, and an availability of LSTM cells.
  - 13. The method as recited in claim 8, wherein the plurality of matrix multiplication operations comprise a same set of N coefficients being multiplied by different sequences of the plurality of sequences, wherein N is a positive integer greater than one.
  - 14. The method as recited in claim 13, wherein the machine learning engine implements a recurrent neural network.

15. An apparatus comprising:
- a machine learning engine; and
  
  a memory coupled to machine learning engine;
  
  wherein the machine learning engine is configured to;
  
  receive a given data block with a plurality of sequences interleaved together;
  
  perform, in parallel, a plurality of matrix multiplication operations on the plurality of sequences from the given data block and a plurality of coefficients; and
  
  convey outputs from the plurality of matrix multiplication units to the one or more long short-term memory (LSTM) layers.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The apparatus as recited in claim 15, wherein each sequence comprises a plurality of samples.
  - 17. The apparatus as recited in claim 15, wherein the plurality of coefficients are stored in an N×
    - (N+M) matrix, wherein N and M are positive integers greater than one.
  - 18. The apparatus as recited in claim 17, wherein the given data block is stored in an N-sample array.
  - 19. The apparatus as recited in claim 18, wherein N is scalable based on a local memory bus width, a number of multiplier-accumulator units, and an availability of LSTM cells.
  - 20. The apparatus as recited in claim 15, wherein the plurality of matrix multiplication operations comprise a same set of N coefficients being multiplied by different sequences of the plurality of sequences, wherein N is a positive integer greater than one.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Advanced Micro Devices, Inc., ATI Technologies ULC (Advanced Micro Devices, Inc.)
Original Assignee
Advanced Micro Devices, Inc., ATI Technologies ULC (Advanced Micro Devices, Inc.)
Inventors
Lagudu, Sateesh, Zhang, Lei, Rush, Allen H.

Granted Patent

US 11,769,041 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 17/16   Matrix or vector computatio...

G06F 2207/4802   Special implementations

G06F 2207/4824   Neural networks

G06F 7/5443   Sum of products for applica...

G06N 20/00   Machine learning

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/063   using electronic means

Low Latency Long Short-Term Memory Inference with Sequence Interleaving

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

7 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Low Latency Long Short-Term Memory Inference with Sequence Interleaving

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

7 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others