Low rank matrix compression
First Claim
Patent Images
1. A general purpose graphics processor comprising:
- an instruction cache to receive a stream of instructions;
an instruction unit to execute the stream of instructions;
a general-purpose graphics processing compute block comprising a plurality of graphics processing cores;
a shared memory communicatively coupled to the plurality of graphics processing cores; and
a processor to;
apply a matrix interpolation operation to one or more linearly dependent rows of a matrix comprising weights of a neural network;
apply a singular value decomposition algorithm to convert one or more weights of one or more linearly dependent rows of the matrix to a low rank;
characterize one or more rows of the matrix comprising weights of a neural network for which a rank of the one or more rows of the matrix is less than a threshold value as independent rows of the matrix;
determine a scalar associated with each of the one or more independent rows of the matrix;
encode a plurality of the one or more independent rows with the scalar associated with the row to generate encoded weight data;
apply delta compression to compress the encoded weight data;
store the encoded weight data in the shared memory; and
load the matrix into the neural network using hardware when the rank is beneath a threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
In an example, an apparatus comprises logic, at least partially including hardware logic, to implement a lossy compression algorithm which utilizes a data transform and quantization process to compress data in a convolutional neural network (CNN) layer. Other embodiments are also disclosed and claimed.
364 Citations
15 Claims
-
1. A general purpose graphics processor comprising:
-
an instruction cache to receive a stream of instructions; an instruction unit to execute the stream of instructions; a general-purpose graphics processing compute block comprising a plurality of graphics processing cores; a shared memory communicatively coupled to the plurality of graphics processing cores; and a processor to; apply a matrix interpolation operation to one or more linearly dependent rows of a matrix comprising weights of a neural network; apply a singular value decomposition algorithm to convert one or more weights of one or more linearly dependent rows of the matrix to a low rank; characterize one or more rows of the matrix comprising weights of a neural network for which a rank of the one or more rows of the matrix is less than a threshold value as independent rows of the matrix; determine a scalar associated with each of the one or more independent rows of the matrix; encode a plurality of the one or more independent rows with the scalar associated with the row to generate encoded weight data; apply delta compression to compress the encoded weight data; store the encoded weight data in the shared memory; and load the matrix into the neural network using hardware when the rank is beneath a threshold. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method, comprising:
-
receiving, in an instruction cache, a stream of instructions; executing, in an instruction unit, the stream of instructions; passing the stream of instructions to a general-purpose graphics processing compute block comprising a plurality of graphics processing cores, plurality of graphics processor cores communicatively coupled to a shared memory, the instruction to perform operations comprising; applying a matrix interpolation operation to one or more linearly dependent rows of a matrix comprising weights of a neural network; applying a singular value decomposition algorithm to convert one or more weights of one or more linearly dependent rows of the matrix to a low rank; characterizing one or more rows of the matrix comprising weights of a neural network for which a rank of the one or more rows of the matrix is less than a threshold value as independent rows of the matrix; determining a scalar associated with each of the one or more independent rows of the matrix; encoding a plurality of the one or more independent rows with the scalar associated with the row to generate encoded weight data; implementing a delta compression algorithm to compress the encoded weight data; storing the encoded weight data in the shared memory; and loading the matrix into the neural network using hardware when the rank is beneath a threshold. - View Dependent Claims (7, 8, 9, 10)
-
-
11. An electronic device comprising:
-
a computer readable memory; and a general purpose graphics processor comprising; an instruction cache to receive a stream of instructions; an instruction unit to execute the stream of instructions; a general-purpose graphics processing compute block comprising a plurality of graphics processing cores; a shared memory communicatively coupled to the plurality of graphics processing cores; and a processor communicatively coupled to the shared memory to; apply a matrix interpolation operation to one or more linearly dependent rows of a matrix comprising weights of a neural network; apply a singular value decomposition algorithm to convert one or more weights of one or more linearly dependent rows of the matrix to a low rank; characterize one or more rows of the matrix comprising weights of a neural network for which a rank of the one or more rows of the matrix is less than a threshold value as independent rows of the matrix; encode a plurality of the one or more independent rows with the scalar associated with the row to generate encoded weight data; implement a delta compression algorithm to compress the encoded weight data; store the encoded weight data in the shared memory; and load the matrix into the neural network using hardware when the rank is beneath a threshold. - View Dependent Claims (12, 13, 14, 15)
-
Specification