Reconfigurable SIMD coprocessor architecture for sum of absolute differences and symmetric filtering (scalable MAC engine for image processing)
First Claim
1. An image processing peripherial comprising:
- a plurality of pairs of multiply accumulate circuits connected in parallel, each pair of multiply accumulate circuits comprising;
first adder pairs, each one of each adder pair having first and second inputs receiving respective first and second inputs having a first predetermined number of bits and an output producing a sum or a difference of said inputs;
first multiplier pairs, corresponding to said first adder pairs, each multiplier of each multiplier pair having a first input of said sum or difference of said first adders and a second input of a constant predetermined number and producing a product output;
second adder pairs, corresponding to said first multiplier pairs, each one adder of said adder pair having first and second inputs receiving respective first multiplier outputs from one or the other of said multipliers of said corresponding multiplier pair as said first input and wherein said one of said pair of second adders receives an output from a first multiplexer, said first multiplexer having one input from a product of the other multiplier of said first multiplier pairs and a second input from an accumulated sum of said one adder of said second adder pairs as a second input of said one adder of said second adder pair and;
wherein said other of said pair of second adders receives outputs from a second and a third multiplexer, said second multiplexer having one input from said other multiplier of said first multiplier pair and a second input from the sum of said one adder of said second adder pair, said third multiplexer having one input from the accumulated sum of said other adder of said second adder pair and a second input from the sum of a one adder of a second pair of second adder pairs, and;
wherein each second adder of said second adder pairs produces a sum output according to selection made by said first, second and third multiplexers.
1 Assignment
0 Petitions
Accused Products
Abstract
The proposed architecture is integrated onto a Digital Signal Processor (DSP) as a coprocessor to assist in the computation of sum of absolute differences, symmetrical row/column Finite Impulse Response (FIR) filtering with a downsampling (or upsampling) option, row/column Discrete Cosine Transform (DCT)/Inverse Discrete Cosine Transform (IDCT), and generic algebraic functions. The architecture is called IPP, which stands for image processing peripheral, and consists of 8 multiply-accumulate hardware units connected in parallel and routed and multiplexed together. The architecture can be dependent upon a Direct Memory Access (DMA) controller to retrieve and write back data from/to DSP memory without intervention from the DSP core. The DSP can set up the DMA transfer and IPP/DMA synchronization in advance, then go on its own processing task. Alternatively, the DSP can perform the data transfers and synchronization itself by synchronizing with the IPP architecture on these transfers. This architecture implements 2-D filtering, symmetrical filtering, short filters, sum of absolute differences, and mosaic decoding more efficiently than the previously disclosed architectures of the prior art.
174 Citations
8 Claims
-
1. An image processing peripherial comprising:
-
a plurality of pairs of multiply accumulate circuits connected in parallel, each pair of multiply accumulate circuits comprising;
first adder pairs, each one of each adder pair having first and second inputs receiving respective first and second inputs having a first predetermined number of bits and an output producing a sum or a difference of said inputs;
first multiplier pairs, corresponding to said first adder pairs, each multiplier of each multiplier pair having a first input of said sum or difference of said first adders and a second input of a constant predetermined number and producing a product output;
second adder pairs, corresponding to said first multiplier pairs, each one adder of said adder pair having first and second inputs receiving respective first multiplier outputs from one or the other of said multipliers of said corresponding multiplier pair as said first input and wherein said one of said pair of second adders receives an output from a first multiplexer, said first multiplexer having one input from a product of the other multiplier of said first multiplier pairs and a second input from an accumulated sum of said one adder of said second adder pairs as a second input of said one adder of said second adder pair and;
wherein said other of said pair of second adders receives outputs from a second and a third multiplexer, said second multiplexer having one input from said other multiplier of said first multiplier pair and a second input from the sum of said one adder of said second adder pair, said third multiplexer having one input from the accumulated sum of said other adder of said second adder pair and a second input from the sum of a one adder of a second pair of second adder pairs, and;
wherein each second adder of said second adder pairs produces a sum output according to selection made by said first, second and third multiplexers.
-
-
2. An image processing peripheral comprising:
-
eight first adders, each first adder having first and second inputs receiving respective first and second input signals and an output producing a selected one of a sum of said inputs or a difference of said inputs;
eight multipliers, each multiplier having a first input connected to said output of a corresponding on of said N first adders, a second input receiving a coefficient input signal and a product output producing a product of said inputs;
eight second adders, each second adder having first and second inputs and an output producing a selected one of a sum of said inputs or a difference of said inputs, said first input of said first, third, fifth and seventh second adders connected to said product of a corresponding multiplier;
eight sum temporary registers, each sum temporary register having an input connected to said output of a corresponding one of said second adders and an output, each sum temporary register temporarily storing said output of said corresponding second adder;
said second input of said eighth second adder connected to said output of said eighth sum temporary register;
a first multiplexer having a first input connected to said output of said first sum temporary register, a second input connected to said product output of said second multiplier and an output connected to said second input of said first second adder, said first multiplexer connecting a selected one of said first input or said second input to said output;
a second multiplexer having a first input connected to said output of said second sum temporary register, a second input connected to said output of said third sum temporary register and an output connected to said second input of said second second adder, said second multiplexer connecting a selected one of said first input or said second input to said output;
a third multiplexer having a first input connected to said output of said third sum temporary register, a second input connected to said product output of said fourth multiplier and an output connected to said second input of said third second adder, said third multiplexer connecting a selected one of said first input or said second input to said output;
a fourth multiplexer having a first input connected to said output of said fourth sum temporary register, a second input connected to output of said sixth sum temporary register and an output connected to said second input of said fourth second adder, said fourth multiplexer connecting a selected one of said first input or said second input to said output;
a fifth multiplexer having a first input connected to said output of said fifth sum temporary register, a second input connected to said product output of said sixth multiplier and an output connected to said second input of said fifth second adder, said fifth multiplexer connecting a selected one of said first input or said second input to said output;
a sixth multiplexer having a first input connected to said output of said sixth sum temporary register, a second input connected to said output of said seventh sum temporary register and an output connected to said second input of said first second adder, said sixth multiplexer connecting a selected one of said first input or said second input to said output;
a seventh multiplexer having a first input connected to said output of said seventh sum temporary register, a second input connected to said product output of said eighth multiplier and an output connected to said second input of said first second adder, said seventh multiplexer connecting a selected one of said first input or said second input to said output;
a eighth multiplexer having a first input connected to said output of said first sum temporary register, a second input connected to said product output of said second multiplier and an output connected to said second input of said first second adder, said eighth multiplexer connecting a selected one of said first input or said second input to said output;
a ninth multiplexer having a first input connected to said output of said second sum temporary register, a second input connected to said product output of said fourth multiplier and an output connected to said second input of said fourth second adder, said ninth multiplexer connecting a selected one of said first input or said second input to said output;
a tenth multiplexer having a first input connected to said output of said fifth sum temporary register, a second input connected to said product output of said sixth multiplier and an output connected to said second input of said sixth second adder, said tenth multiplexer connecting a selected one of said first input or said second input to said output;
an eleventh multiplexer having a first input connected to said output of said sixth sum temporary register, a second input connected to said product output of said eighth multiplier, a third input connected to said fourth sum temporary and an output connected to said second input of said sixth second adder, said tenth multiplexer connecting a selected one of said first input, said second input or said third to said output;
a third adder having a first input connected to said second sum temporary register, a second input connected to said sixth sum temporary register and an output producing a selected one of a sum of said inputs or a difference of said inputs;
a fourth adder having a first input connected to said output of said third adder, a second input and an output producing a selected one of a sum of said inputs or a difference of said inputs;
a ninth sum temporary register having an input connected to said output of said fourth adder and an output connected to said second input of said fourth adder, said ninth sum temporary register temporarily storing said output of said fourth adder; and
nine image processing peripheral outputs, each output connected to a corresponding one of said sum temporary registers. - View Dependent Claims (3, 4, 5, 6, 7, 8)
eight second sum temporary registers, each second sum temporary register having an input connected to said output of a corresponding first adder and an output connected to said first input of a corresponding multiplier, each second sum temporary register temporarily storing said output of said corresponding first adder.
-
-
4. The image processing peripheral of claim 2, further comprising:
eight pipeline registers, each pipeline register having an input connected to said output of a corresponding multiplier, and an output, said output of said first pipeline register connected to said first input of said first second adder, said output of said second pipeline register connected to said second input of said eighth multiplexer, said output of said third pipeline register connected to said first input of said third second adder, said output of said fourth pipeline register connected to said second input of said ninth multiplexer, said output of said fifth pipeline register connected to said first input of said fifth second adder, said output of said sixth pipeline register connected to said second input of said tenth multiplexer, said output of said seventh pipeline register connected to said first input of said seventh second adder and said output of said eighth pipeline register connected to said second input of said eleventh multiplexer.
-
5. The image processing peripheral of claim 2, further comprising:
nine variable depth accumulators, each accumulator having a first input connected to said output of a corresponding sum temporary register and an output for temporarily storing at least three outputs of said corresponding sum temporary register, said outputs of said first to seventh variable depth accumulators connected to said first input of a corresponding multiplexer, said output of said eighth variable depth accumulator connected to said second input of said eighth second adder and said output of said ninth variable depth accumulator connected to said second input of said fourth adder.
-
6. The image processing peripheral of claim 4, further comprising:
nine right shifters, each right shifter having an input connected to said output of a corresponding sum temporary register and an output connected to a corresponding image processing peripheral output, each right shifter right shifting said input.
-
7. The image processing peripheral of claim 2, further comprising:
nine saturation units, each saturation unit having an input connected to said output of a corresponding sum temporary register and an output connected to a corresponding image processing peripheral output, each saturation unit outputting a first saturation value if said input is greater than an upper threshold and a second saturation value if said input is less than a lower threshold.
-
8. The image processing peripheral of claim 2, further comprising:
-
nine right shifters, each right shifter having an input connected to said output of a corresponding sum temporary register and an output, each right shifter right shifting said input; and
nine saturation units, each saturation unit having an input connected to said output of a corresponding right shifter and an output connected to a corresponding image processing peripheral output, each saturation unit outputting a first saturation value if said input is greater than an upper threshold and a second saturation value if said input is less than a lower threshold.
-
Specification