PERFORMING MULTI-CONVOLUTION OPERATIONS IN A PARALLEL PROCESSING SYSTEM
First Claim
1. A computer-implemented method for performing a multi-convolution operation, the method comprising:
- calculating a first source location included in an image batch that is stored in a first memory based on a first destination location included in a first image tile that is stored in a second memory;
copying data from the first source location to the first destination location;
copying data from a filter source location included in a filter stack that is stored in the first memory to a filter destination location included in a first filter tile that is stored in the second memory; and
performing one or more matrix multiplication operations between the first image tile and the first filter tile to generate a first output tile associated with an output matrix that is stored in the second memory.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment of the present invention a convolution engine configures a parallel processing pipeline to perform multi-convolution operations. More specifically, the convolution engine configures the parallel processing pipeline to independently generate and process individual image tiles. In operation, for each image tile, the pipeline calculates source locations included in an input image batch. Notably, the source locations reflect the contribution of the image tile to an output tile of an output matrix—the result of the multi-convolution operation. Subsequently, the pipeline copies data from the source locations to the image tile. Similarly, the pipeline copies data from a filter stack to a filter tile. The pipeline then performs matrix multiplication operations between the image tile and the filter tile to generate data included in the corresponding output tile. To optimize both on-chip memory usage and execution time, the pipeline creates each image tile in on-chip memory as-needed.
-
Citations
20 Claims
-
1. A computer-implemented method for performing a multi-convolution operation, the method comprising:
-
calculating a first source location included in an image batch that is stored in a first memory based on a first destination location included in a first image tile that is stored in a second memory; copying data from the first source location to the first destination location; copying data from a filter source location included in a filter stack that is stored in the first memory to a filter destination location included in a first filter tile that is stored in the second memory; and performing one or more matrix multiplication operations between the first image tile and the first filter tile to generate a first output tile associated with an output matrix that is stored in the second memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory, computer-readable storage medium including instructions that, when executed by a processor, cause the processor to perform a multi-convolution operation, by performing the steps of:
-
calculating a first source location included in an image batch that is stored in a first memory based on a first destination location included in a first image tile that is stored in a second memory; copying data from the first source location to the first destination location; copying data from a filter source location included in a filter stack that is stored in the first memory to a filter destination location included in a first filter tile that is stored in the second memory; and performing one or more matrix multiplication operations between the first image tile and the first filter tile to generate a first output tile associated with an output matrix that is stored in the second memory. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A system configured to perform a multi-convolution operation, the system comprising:
-
a first memory; a second memory; and a convolution engine coupled to both the first memory and the second memory, and configured to; calculate a first source location included in an image batch that is stored in the first memory based on a first destination location included in a first image tile that is stored in the second memory, copy data from the first source location to the first destination location, copy data from a filter source location included in a filter stack that is stored in the first memory to a filter destination location included in a first filter tile that is stored in the second memory, and perform one or more matrix multiplication operations between the first image tile and the first filter tile to generate a first output tile associated with an output matrix that is stored in the second memory.
-
Specification