Convolution filtering in a graphics processor
First Claim
1. A method comprising:
- receiving, at a first processor, a command to perform convolution filtering on a grid of pixels;
partitioning the grid into multiple sections;
generating multiple instructions for the multiple sections, each instruction for performing a convolution computation on at least one pixel in one section;
dispatching at least one of the multiple instructions to a second processor, the second processor multiplying the at least one pixel with at least one coefficient received in the at least one instruction, and accumulating at least one result of the multiply to generate an intermediate result; and
generating instructions to combine intermediate results from the multiple instructions for the multiple sections.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for performing convolution filtering using hardware normally available in a graphics processor are described. Convolution filtering of an arbitrary H×W grid of pixels is achieved by partitioning the grid into smaller sections, performing computation for each section, and combining the intermediate results for all sections to obtain a final result. In one design, a command to perform convolution filtering on a grid of pixels with a kernel of coefficients is received, e.g., from a graphics application. The grid is partitioned into multiple sections, where each section may be 2×2 or smaller. Multiple instructions are generated for the multiple sections, with each instruction performing convolution computation on at least one pixel in one section. Each instruction may include pixel position information and applicable kernel coefficients. Instructions to combine the intermediate results from the multiple instructions are also generated.
141 Citations
25 Claims
-
1. A method comprising:
-
receiving, at a first processor, a command to perform convolution filtering on a grid of pixels; partitioning the grid into multiple sections; generating multiple instructions for the multiple sections, each instruction for performing a convolution computation on at least one pixel in one section; dispatching at least one of the multiple instructions to a second processor, the second processor multiplying the at least one pixel with at least one coefficient received in the at least one instruction, and accumulating at least one result of the multiply to generate an intermediate result; and generating instructions to combine intermediate results from the multiple instructions for the multiple sections. - View Dependent Claims (2, 3, 4, 10)
-
-
5. An apparatus comprising:
-
first processing means for receiving an instruction, multiplying a pixel by a coefficient in the instruction and accumulating a result of the multiplication to generate an intermediate result for the received instruction; and second processing means for receiving a command to perform convolution filtering on a grid of pixels;
for partitioning the grid into multiple sections;
for generating multiple instructions for the multiple sections, each instruction performing convolution computation on at least one pixel in one section;
for dispatching each instruction to the first processing means; and
for generating instructions to combine intermediate results from the multiple instructions for the multiple sections. - View Dependent Claims (6, 7, 8)
-
-
9. A non-transitory computer-readable media storing instructions that configure circuitry to:
-
receive, at a first processor, a command to perform convolution filtering on a grid of pixels; partition the grid into multiple sections; generate multiple instructions for the multiple sections, each instruction performing convolution computation on at least one pixel in one section; dispatch the multiple instructions to a second processor, which is configured to multiply a pixel in a section with a coefficient in one of the multiple instructions, and accumulated a result of the multiplication as an intermediate result; and generate instructions to combine intermediate results from the multiple instructions for the multiple sections.
-
-
11. An graphics processor comprising:
-
a first processing unit configured to receive a set of instructions for convolution filtering of a grid of pixels, to dispatch a plurality of instructions in the set, to receive intermediate results for the dispatched instructions, and to combine the intermediate results to generate a final result for the convolution filtering of the grid of pixels; and a second processing unit configured to receive the instructions dispatched by the shader core, to perform computation on at least one pixel in the grid for each instruction, and to provide an intermediate result for each instruction, wherein the second processing unit is configured to retrieve the at least one pixel from memory, to multiply the at least one pixel with at least one coefficient received in the instruction, and to accumulate at least one result of the multiply to generate the intermediate result for the instruction. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A method comprising:
-
receiving a set of instructions for convolution filtering of a grid of pixels; dispatching a plurality of instructions in the set; performing computation on at least one pixel in the grid for each dispatched instruction to obtain an intermediate result for the dispatched instruction; and combining intermediate results for the plurality of dispatched instructions to generate a final result, wherein performing computation on the at least one pixel in the grid for each dispatched instruction comprises; retrieving the at least one pixel from memory, multiplying the at least one pixel with at least one coefficient received in the instruction, and accumulating at least one result of the multiply to generate the intermediate result for the instruction. - View Dependent Claims (25)
-
Specification