Apparatus and Method for Performing SIMD Multiply-Accumulate Operations
First Claim
1. A data processing apparatus comprising:
- SIMD data processing circuitry responsive to control signals to perform data processing operations in parallel on multiple data elements;
instruction decoder circuitry coupled to said SIMD data processing circuitry and responsive to program instructions to generate said control signals;
said instruction decoder circuitry being responsive to a repeating multiply-accumulate (repeating MAC) instruction having as input operands a first vector of input data elements, a second vector of coefficient data elements, and a scalar value indicative of a plurality of iterations M required, to generate control signals to control said SIMD data processing circuitry;
to perform said plurality of iterations of a multiply-accumulate process, each iteration of the multiply-accumulate process comprising performing N multiply-accumulate operations in parallel in order to produce N multiply-accumulate data elements;
for each iteration, to determine N input data elements from said first vector and a single coefficient data element from said second vector to be multiplied with each of the N input data elements during the N multiply-accumulate operations; and
to output N multiply-accumulate results derived from the N multiply-accumulate data elements produced in a final iteration of the multiply-accumulate process.
2 Assignments
0 Petitions
Accused Products
Abstract
An apparatus and method for performing SIMD multiply-accumulate operations includes SIMD data processing circuitry responsive to control signals to perform data processing operations in parallel on multiple data elements. Instruction decoder circuitry is coupled to the SIMD data processing circuitry and is responsive to program instructions to generate the required control signals. The instruction decoder circuitry is responsive to a single instruction (referred to herein as a repeating multiply-accumulate instruction) having as input operands a first vector of input data elements, a second vector of coefficient data elements, and a scalar value indicative of a plurality of iterations required, to generate control signals to control the SIMD processing circuitry. In response to those control signals, the SIMD data processing circuitry performs the plurality of iterations of a multiply-accumulate process, each iteration involving performance of N multiply-accumulate operations in parallel in order to produce N multiply-accumulate data elements. For each iteration, the SIMD data processing circuitry determines N input data elements from said first vector and a single coefficient data element from the second vector to be multiplied with each of the N input data elements. The N multiply-accumulate data elements produced in a final iteration of the multiply-accumulate process are then used to produce N multiply-accumulate results. This mechanism provides a particularly energy efficient mechanism for performing SIMD multiply-accumulate operations, as for example are required for FIR filter processes.
-
Citations
30 Claims
-
1. A data processing apparatus comprising:
-
SIMD data processing circuitry responsive to control signals to perform data processing operations in parallel on multiple data elements; instruction decoder circuitry coupled to said SIMD data processing circuitry and responsive to program instructions to generate said control signals; said instruction decoder circuitry being responsive to a repeating multiply-accumulate (repeating MAC) instruction having as input operands a first vector of input data elements, a second vector of coefficient data elements, and a scalar value indicative of a plurality of iterations M required, to generate control signals to control said SIMD data processing circuitry; to perform said plurality of iterations of a multiply-accumulate process, each iteration of the multiply-accumulate process comprising performing N multiply-accumulate operations in parallel in order to produce N multiply-accumulate data elements; for each iteration, to determine N input data elements from said first vector and a single coefficient data element from said second vector to be multiplied with each of the N input data elements during the N multiply-accumulate operations; and to output N multiply-accumulate results derived from the N multiply-accumulate data elements produced in a final iteration of the multiply-accumulate process. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A method of processing data using SIMD data processing circuitry responsive to control signals to perform data processing operations in parallel on multiple data elements and instruction decoder circuitry coupled to said SIMD data processing circuitry and responsive to program instructions to generate said control signals, said method comprising the steps of:
-
decoding a repeating multiply-accumulate (repeating MAC) instruction having as input operands a first vector of input data elements, a second vector of coefficient data elements, and a scalar value indicative of a plurality of iterations M required, to generate control signals; and controlling said SIMD data processing circuitry with said control signals to produce multiply-accumulate results by the steps of; performing said plurality of iterations of a multiply-accumulate process, each iteration of the multiply-accumulate process comprising performing N multiply-accumulate operations in parallel in order to produce N multiply-accumulate data elements; for each iteration, determining N input data elements from said first vector and a single coefficient data element from said second vector to be multiplied with each of the N input data elements during the N multiply-accumulate operations; and outputting N multiply-accumulate results derived from the N multiply-accumulate data elements produced in a final iteration of the multiply-accumulate process.
-
-
29. A virtual machine implementation of a data processing apparatus, said virtual machine implementation being responsive to a repeating multiply-accumulate (repeating MAC) instruction having as input operands a first vector of input data elements, a second vector of coefficient data elements, and a scalar value indicative of a plurality of iterations M required, to produce multiply-accumulate results by the steps of:
-
performing said plurality of iterations of a multiply-accumulate process, each iteration of the multiply-accumulate process comprising performing N multiply-accumulate operations in parallel in order to produce N multiply-accumulate data elements; for each iteration, determining N input data elements from said first vector and a single coefficient data element from said second vector to be multiplied with each of the N input data elements during the N multiply-accumulate operations; and outputting N multiply-accumulate results derived from the N multiply-accumulate data elements produced in a final iteration of the multiply-accumulate process.
-
-
30. A data processing apparatus comprising:
-
SIMD data processing means for performing data processing operations in parallel on multiple data elements in response to control signals; and instruction decoder means coupled to said SIMD data processing means for generating said control signals in response to program instructions; wherein said instruction decoder means, in response to a repeating multiply-accumulate (repeating MAC) instruction having as input operands a first vector of input data elements, a second vector of coefficient data elements, and a scalar value indicative of a plurality of iterations M required, generates control signals to control said SIMD data processing means to produce multiply-accumulate results by the steps of; performing said plurality of iterations of a multiply-accumulate process, each iteration of the multiply-accumulate process comprising performing N multiply-accumulate operations in parallel in order to produce N multiply-accumulate data elements; for each iteration, determining N input data elements from said first vector and a single coefficient data element from said second vector to be multiplied with each of the N input data elements during the N multiply-accumulate operations; and outputting N multiply-accumulate results derived from the N multiply-accumulate data elements produced in a final iteration of the multiply-accumulate process.
-
Specification