Data processor with enhanced instruction execution and method
First Claim
1. An extensible pipelined processor adapted for performing iterative calculations on a plurality of data, said processor having an extension instruction set associated therewith, and comprising:
- at least one multiply-accumulate stage having at least one accumulator associated therewith;
at least one register window;
at least one extension instruction provided within said extension instruction set, said at least one extension instruction being adapted to;
(i) subtract a value present in said at least one accumulator from a multiple of a first one of said plurality of data; and
(ii) preload said at least one accumulator with a second one of said plurality of data; and
logic operatively connected to said at least one multiply-accumulate stage and adapted to write back the result of said subtraction to said register window.
2 Assignments
0 Petitions
Accused Products
Abstract
An apparatus and method for performing enhanced algorithmic processing, including reduced cycle-count fast Fourier transform (FFT) calculations. In one aspect, the invention comprises a user-configurable processor having an extension instruction adapted for reduced cycle-count algorithmic operations. In one exemplary embodiment, the processor is an extensible core, and the extension instruction comprises a 32-bit instruction word linked with existing circuitry in the processor core used for multiply-accumulate (mac) instructions. 16-bit, 24-bit, and dual 16-bit multiply options are available for the multiply/accumulate unit of the processor. The extension instruction is pipelined to the same number of stages as the mac instructions, thereby avoiding unnecessary stalls and increasing performance. A modified accumulator data path used in support of the foregoing instruction is also described. A computer program and apparatus for synthesizing logic implementing the aforementioned functionality are also described.
-
Citations
25 Claims
-
1. An extensible pipelined processor adapted for performing iterative calculations on a plurality of data, said processor having an extension instruction set associated therewith, and comprising:
-
at least one multiply-accumulate stage having at least one accumulator associated therewith;
at least one register window;
at least one extension instruction provided within said extension instruction set, said at least one extension instruction being adapted to;
(i) subtract a value present in said at least one accumulator from a multiple of a first one of said plurality of data; and
(ii) preload said at least one accumulator with a second one of said plurality of data; and
logic operatively connected to said at least one multiply-accumulate stage and adapted to write back the result of said subtraction to said register window. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A pipelined digital processor adapted for performing iterative calculations on a plurality of data, said processor having an instruction set with at least one extension instruction, said at least one extension instruction adapted for FFT butterfly calculation, the processor comprising:
-
at least one multiply-accumulate stage having at least one accumulator associated therewith;
at least one register window; and
write-back logic operatively connected to said at least one multiply-accumulate stage;
wherein said at least one extension instruction being adapted to;
(i) subtract a value present in said at least one accumulator from a multiple of a first one of said plurality of data;
(ii) preload said at least one accumulator with a second one of said plurality of data; and
(iii) cooperate with said logic to write back the result of said subtraction to said window register. - View Dependent Claims (12, 14, 15, 19, 20)
-
-
13. An accumulator used in a digital processor, comprising:
-
an adder having first and second inputs, said second input being operatively coupled to a first multiplier, said adder adapted to add said inputs to produce at least one output;
a first multiplexer adapted to multiplex a plurality of inputs onto at least one output;
at least one of said plurality of inputs comprising at least one of said at least one outputs of said adder, at least one of said plurality of inputs of said first multiplexer comprising a pre-load signal associated with a butterfly operation; and
at least one register operatively coupled to said at least one output of said first multiplexer and said first input of said adder.
-
-
16. An extensible processor adapted for performing iterative calculations on a plurality of data, said processor having a multi-stage instruction pipeline and an extension instruction set, comprising:
-
at least one multiply-accumulate stage having at least one accumulator associated therewith;
at least one first extension instruction being adapted to perform at least a portion of an iterative calculation on said plurality of data; and
at least one second extension instruction adapted to perform a multiply or multiply-accumulate operation using said at least one multiply accumulate stage and accumulator;
wherein said at least one first extension instruction is pipelined to the same number of stages as said at least one second extension instruction, thereby avoiding pipeline stalling during processing of said at least one first instruction.
-
-
17. An extensible processor adapted for performing iterative calculations on a plurality of data, said processor having a multi-stage instruction pipeline and an extension instruction set, comprising:
-
at least one multiply-accumulate stage having at least one accumulator associated therewith;
at least one first extension instruction being adapted to perform at least a portion of an iterative calculation on said plurality of data; and
at least one second extension instruction adapted to perform a multiply or multiply-accumulate operation using said at least one multiply accumulate stage and accumulator;
wherein the propagation of said at least one first extension instruction within said pipeline is controlled at least in part through added pipeline depth, said added pipeline depth providing for reduced execution time of said at least one first extension instruction.
-
-
18. A method of performing an iterative calculation using a configurable, extensible processor having an instruction set, pipeline, and at least one multiply-accumulate stage with accumulator, the method comprising:
-
inserting a first extension instruction into said pipeline, said first instruction adapted to;
(i) subtract a value present in said at least one accumulator from a multiple of a first input value;
(ii) preload said at least one accumulator with a second input value; and
(iii) write back the result of the aforementioned subtraction operation to a designated register location;
providing a plurality of inputs to said at least one multiply-accumulate stage, said plurality of inputs comprising at least said first and second inputs; and
executing said first extension instruction to produce at least one output from said at least one multiply-accumulate stage.
-
-
21. A method of performing an iterative calculation using a configurable, extensible processor having an instruction set, pipeline, and at least one multiply-accumulate stage, said processor synthesized at least in part using the method comprising
(i) providing a first extension instruction in a hardware description language (HDL), said first extension instruction being adapted to utilize existing logic within said processor related to other instructions within said instruction set, (ii) adding said first extension instruction to the design of extended data processor; - and (iii) synthesizing the extended processor design including said first extension instruction, said first extension instruction being pipelined to the same number of stages as at least one of said other instructions, the method of performing comprising;
inserting said first extension instruction into said pipeline;
providing a plurality of inputs to said at least one multiply-accumulate stage; and
executing said first extension instruction to produce at least one output from said at least one multiply-accumulate stage. - View Dependent Claims (22, 24, 25)
- and (iii) synthesizing the extended processor design including said first extension instruction, said first extension instruction being pipelined to the same number of stages as at least one of said other instructions, the method of performing comprising;
-
23. An integrated circuit device optimized for performing iterative calculations on input data, comprising:
-
at least one silicon die having a plurality of circuit features formed thereon; and
an extended processor having a multi-stage instruction pipeline and an extension instruction set, comprising;
at least one multiply-accumulate stage having at least one accumulator associated therewith;
at least one first extension instruction being adapted to perform at least a portion of an iterative calculation on said plurality of data; and
at least one second extension instruction adapted to perform a multiply or multiply-accumulate operation using said at least one multiply accumulate stage and accumulator;
wherein said at least one first extension instruction is pipelined to the same number of stages as said at least one second extension instruction, thereby avoiding pipeline stalling during processing of said at least one first instruction.
-
Specification