Low rank matrix compression

US 11,037,330 B2
Filed: 04/08/2017
Issued: 06/15/2021
Est. Priority Date: 04/08/2017
Status: Active Grant

First Claim

Patent Images

1. A general purpose graphics processor comprising:

an instruction cache to receive a stream of instructions;

an instruction unit to execute the stream of instructions;

a general-purpose graphics processing compute block comprising a plurality of graphics processing cores;

a shared memory communicatively coupled to the plurality of graphics processing cores; and

a processor to;

apply a matrix interpolation operation to one or more linearly dependent rows of a matrix comprising weights of a neural network;

apply a singular value decomposition algorithm to convert one or more weights of one or more linearly dependent rows of the matrix to a low rank;

characterize one or more rows of the matrix comprising weights of a neural network for which a rank of the one or more rows of the matrix is less than a threshold value as independent rows of the matrix;

determine a scalar associated with each of the one or more independent rows of the matrix;

encode a plurality of the one or more independent rows with the scalar associated with the row to generate encoded weight data;

apply delta compression to compress the encoded weight data;

store the encoded weight data in the shared memory; and

load the matrix into the neural network using hardware when the rank is beneath a threshold.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In an example, an apparatus comprises logic, at least partially including hardware logic, to implement a lossy compression algorithm which utilizes a data transform and quantization process to compress data in a convolutional neural network (CNN) layer. Other embodiments are also disclosed and claimed.

364 Citations

15 Claims

1. A general purpose graphics processor comprising:
- an instruction cache to receive a stream of instructions;
  
  an instruction unit to execute the stream of instructions;
  
  a general-purpose graphics processing compute block comprising a plurality of graphics processing cores;
  
  a shared memory communicatively coupled to the plurality of graphics processing cores; and
  
  a processor to;
  
  apply a matrix interpolation operation to one or more linearly dependent rows of a matrix comprising weights of a neural network;
  
  apply a singular value decomposition algorithm to convert one or more weights of one or more linearly dependent rows of the matrix to a low rank;
  
  characterize one or more rows of the matrix comprising weights of a neural network for which a rank of the one or more rows of the matrix is less than a threshold value as independent rows of the matrix;
  
  determine a scalar associated with each of the one or more independent rows of the matrix;
  
  encode a plurality of the one or more independent rows with the scalar associated with the row to generate encoded weight data;
  
  apply delta compression to compress the encoded weight data;
  
  store the encoded weight data in the shared memory; and
  
  load the matrix into the neural network using hardware when the rank is beneath a threshold.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The apparatus of claim 1, the processor to:
    - compress at least a portion of the encoded weight data in a frequency domain.
  - 3. The apparatus of claim 2, the processor to:
    - quantize the at least a portion of the encoded weight data in the frequency domain.
  - 4. The apparatus of claim 2, the processor to:
    - compress the at least a portion of the encoded weight data via K-means compression.
  - 5. The apparatus of claim 2, the processor to:
    - apply an inversed transform to the neural network layer.

6. A method, comprising:
- receiving, in an instruction cache, a stream of instructions;
  
  executing, in an instruction unit, the stream of instructions;
  
  passing the stream of instructions to a general-purpose graphics processing compute block comprising a plurality of graphics processing cores, plurality of graphics processor cores communicatively coupled to a shared memory, the instruction to perform operations comprising;
  
  applying a matrix interpolation operation to one or more linearly dependent rows of a matrix comprising weights of a neural network;
  
  applying a singular value decomposition algorithm to convert one or more weights of one or more linearly dependent rows of the matrix to a low rank;
  
  characterizing one or more rows of the matrix comprising weights of a neural network for which a rank of the one or more rows of the matrix is less than a threshold value as independent rows of the matrix;
  
  determining a scalar associated with each of the one or more independent rows of the matrix;
  
  encoding a plurality of the one or more independent rows with the scalar associated with the row to generate encoded weight data;
  
  implementing a delta compression algorithm to compress the encoded weight data;
  
  storing the encoded weight data in the shared memory; and
  
  loading the matrix into the neural network using hardware when the rank is beneath a threshold.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The method of claim 6, further comprising:
    - compressing at least a portion of the encoded weight data in a frequency domain.
  - 8. The method of claim 7, further comprising:
    - quantizing the at least a portion of the encoded weight data in the frequency domain.
  - 9. The method of claim 7, further comprising:
    - compress the at least a portion of the encoded weight data via K-means compression.
  - 10. The method of claim 7, further comprising:
    - applying an inversed transform to the neural network layer.

11. An electronic device comprising:
- a computer readable memory; and
  
  a general purpose graphics processor comprising;
  
  an instruction cache to receive a stream of instructions;
  
  an instruction unit to execute the stream of instructions;
  
  a general-purpose graphics processing compute block comprising a plurality of graphics processing cores;
  
  a shared memory communicatively coupled to the plurality of graphics processing cores; and
  
  a processor communicatively coupled to the shared memory to;
  
  apply a matrix interpolation operation to one or more linearly dependent rows of a matrix comprising weights of a neural network;
  
  apply a singular value decomposition algorithm to convert one or more weights of one or more linearly dependent rows of the matrix to a low rank;
  
  characterize one or more rows of the matrix comprising weights of a neural network for which a rank of the one or more rows of the matrix is less than a threshold value as independent rows of the matrix;
  
  encode a plurality of the one or more independent rows with the scalar associated with the row to generate encoded weight data;
  
  implement a delta compression algorithm to compress the encoded weight data;
  
  store the encoded weight data in the shared memory; and
  
  load the matrix into the neural network using hardware when the rank is beneath a threshold.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The electronic device of claim 11, the processor to:
    - compress at least a portion of the encoded weight data in a frequency domain.
  - 13. The electronic device of claim 12, the processor to:
    - quantize the at least a portion of the encoded weight data in the frequency domain.
  - 14. The electronic device of claim 12, the processor to:
    - compress the at least a portion of the encoded weight data.
  - 15. The electronic device of claim 12, the processor to:
    - apply an inversed transform to the neural network layer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Bar-On, Tomer, Subag, Jacob, Fais, Yaniv, Dreyfuss, Jeremie, Novik, Gal, Leibovich, Gal, Schwartz, Tomer, Cohen, Ehud, Faivishevsky, Lev, Sarel, Uzi, Armon, Amitai, Shadmiy, Yahav
Primary Examiner(s)
Beard, Charles L

Application Number

US15/482,725
Publication Number

US 20180293758A1
Time in Patent Office

1,529 Days
Field of Search
US Class Current
CPC Class Codes

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/047   Probabilistic or stochastic...

G06N 3/048   Activation functions

G06N 3/084   Backpropagation, e.g. using...

G06N 3/088   Non-supervised learning, e....

G06T 9/002   using neural networks

H04N 19/42   characterised by implementa...

H04N 19/436   using parallelised computat...

Low rank matrix compression

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

364 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Low rank matrix compression

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

364 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links