EFFICIENT COMPLEX MULTIPLICATION AND FAST FOURIER TRANSFORM (FFT) IMPLEMENTATION ON THE MANARRAY ARCHITECTURE
First Claim
1. An apparatus for the efficient processing of complex multiplication computations, the apparatus comprising:
- at least one controller sequence processor (SP);
a memory for storing process control instructions;
a first multiply complex numbers instruction stored in the memory and operative to control the PEs to carry out a multiplication operation involving a pair of complex numbers; and
hardware for implementing the first multiply complex numbers instruction.
5 Assignments
0 Petitions
Accused Products
Abstract
Efficient computation of complex multiplication results and very efficient fast Fourier transforms (FFTs) are provided. A parallel array VLIW digital signal processor is employed along with specialized complex multiplication instructions and communication operations between the processing elements which are overlapped with computation to provide very high performance operation. Successive iterations of a loop of tightly packed VLIWs are used allowing the complex multiplication pipeline hardware to be efficiently used. In addition, efficient techniques for supporting combined multiply accumulate operations are described.
-
Citations
39 Claims
-
1. An apparatus for the efficient processing of complex multiplication computations, the apparatus comprising:
-
at least one controller sequence processor (SP);
a memory for storing process control instructions;
a first multiply complex numbers instruction stored in the memory and operative to control the PEs to carry out a multiplication operation involving a pair of complex numbers; and
hardware for implementing the first multiply complex numbers instruction. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for the computation of an FFT by a plurality of processing elements (PEs), the method comprising the steps of:
-
loading input data from a memory into each PE in a cyclic manner;
calculating a local FFT by each PE;
multiplying by the twiddle factors and calculating a FFT by the cluster of PEs; and
loading the FFTs into the memory.
-
-
16. A method for the computation of a distributed FFT by an N×
- N processing element (PE) array, the method comprising the steps of;
loading a complex number x and a corresponding twiddle factor w from a memory into each of the PEs;
calculating a first product by the multiplication of the complex numbers x and w;
transmitting the first product from each of the PEs to another PE in the N×
N array;
receiving the first product and treating it as a second product in each of the PEs;
selectively adding or subtracting the first product and the second product to form a first result;
calculating a third product in selected PEs;
transmitting the first result or third product in selected PEs to another PE in the N×
N array;
selectively adding or subtracting the received values to form a second result; and
storing the second results in the memory.
- N processing element (PE) array, the method comprising the steps of;
-
17. A method for efficient computation by a 2×
- 2 processing element (PE) array interconnected in a manifold array interconnection network, the array comprising four PEs (PE0, PE1, PE2 and PE3), the method comprising the steps of;
loading a complex number x and a corresponding twiddle factor w from a memory into each of the four PEs, complex number x including subparts x0, x1, x2 and x3, twiddle factor w including subparts w0, w1, w2 and w3;
multiplying the complex numbers x and w, such that PE0 multiplies x0 and w0 to produce a product0, PE1 multiplies x1 and w1 to produce a product1, PE2 multiplies x2 and w2 to produce a product2, and PE3 multiplies x3 and w3 to produce a product3;
transmitting the product0, the product1, the product2 and the product3, such that PE0 transmits the product0 to PE2, PE1 transmits the product1 to PE3, PE2 transmits the product2 to PE0, and PE3 transmits the product3 to PE1; and
performing arithmetic logic operations, such that PE0 adds the product0 and the product2 to produce a sum t0, PE1 adds the product1 and the product3 to produce a sum t2, PE2 subtracts the product2 from the product0 to produce a sum t1, and PE3 subtracts the product3 from the product1 to produce a result which is multiplied by −
i to produce a sum t3. - View Dependent Claims (18)
- 2 processing element (PE) array interconnected in a manifold array interconnection network, the array comprising four PEs (PE0, PE1, PE2 and PE3), the method comprising the steps of;
- 19. A special hardware instruction for handling the multiplication with accumulate for two complex numbers from a source register whereby utilizing said instruction and accumulated complex product of two source operands is rounded according to a rounding mode specified in the instruction and loaded into a target register with the complex numbers organized in the source such that a halfword (H1) contains the real component and a halfword (H0) contains the imaginary component.
-
21. An apparatus to efficiently fetch instructions including complex multiplication instructions and an accumulate form of multiplication instructions from a memory element and dispatch the fetched instruction to at least one of a plurality of multiply complex and multiply with accumulate execution units to carry out the instruction specified operation, the apparatus comprising:
-
a memory element;
means for fetching said instructions from the memory element;
a plurality of multiply complex and multiply with accumulate execution units; and
means to dispatch the fetched instruction to at least one of said plurality of execution units to carry out the instruction specified operation. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 38, 39)
-
-
35. An apparatus for the efficient processing of an FFT, the apparatus comprising:
-
at least one controller sequence processor (SP);
a plurality of processing elements (PEs) arranged in an N×
N array interconnected in a manifold (ManArray) interconnection network; and
a memory for storing instructions to be processed by the SP and by the array of PEs.
-
Specification