Cycle segmented prefix circuits
First Claim
1. A circuit comprising:
- a plurality of multiple input multiplexers and multiple input function generators connected in series, each multiplexer having one input that is an identity function and an output that is selected by a segment bit, and each function generator having one input that is output of the preceding multiplexer and an output that is an input to the next multiplexer, and the output of the last function generator is connected as an input to the first multiplexer.
0 Assignments
0 Petitions
Accused Products
Abstract
The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular, the critical-path delays of many components in existing implementations grow quadratically with the issue width and the window size. This patent presents a novel way to reimplement these components and reduce their critical-path delay growth. It then describes an entire processor microarchitecture, called the Ultrascalar processor, that has better critical-path delay growth than existing superscalars. Most of our scalable designs are based on a single circuit, a cyclic segmented parallel prefix (cspp). We observe that processor components typically operate on a wrap-around sequence of instructions, computing some associative property of that sequence. For example, to assign an ALU to the oldest requesting instruction, each instruction in the instruction sequence must be told whether any preceding instructions are requesting an ALU. Similarly, to read an argument register, an instruction must somehow communicate with the most recent preceding instruction that wrote that register. A cspp circuit can implement such functions by computing for each instruction within a wrap-around instruction sequence the accumulative result of applying some associative operator to all the preceding instructions. A cspp circuit has a critical path gate delay logarithmic in the length of the instruction sequence. Depending on its associative operation and its layout, a cspp circuit can have a critical path wire delay sublinear in the length of the instruction sequence.
-
Citations
18 Claims
-
1. A circuit comprising:
-
a plurality of multiple input multiplexers and multiple input function generators connected in series, each multiplexer having one input that is an identity function and an output that is selected by a segment bit, and each function generator having one input that is output of the preceding multiplexer and an output that is an input to the next multiplexer, and the output of the last function generator is connected as an input to the first multiplexer.
-
-
2. A circuit comprising:
-
a plurality of modules connected in a tree structure, each module comprising;
at least first and second function generators, first and second multiplexers and an OR gate, a first input and a first output connecting the module to an output and an input, respectively, of a module closer to the root of the tree structure, a second input and a second output connecting the module to an output and an input, respectively, of a module farther from the root of the tree structure and a third input and a third output connecting the module to an output and an input, respectively, of another module farther from the root of the tree structure, a fourth and a fifth input providing segment bits to the module from modules farther from the root and a fourth output providing a segment bit to a module closer to the root, said first and second inputs of each module being connected as inputs to the first function generator of that module and said second and third inputs being connected as inputs to the second function generator of that module, the first multiplexer having a first input that is an output of the first function generator and a second input that is the second input to that module and the second multiplexer having a first input that is an output of the second function generator and a second input that is the third input to that module, the first and second multiplexers having outputs that are selected by segment bits on the fourth and fifth inputs, the first output of each module being derived from the second multiplexer and one of the second and third outputs being derived from the first multiplexer, and the OR gate generating a logical OR of the fourth and fifth inputs and providing it to the fourth output, and the first output of at least one module closest to the root of the tree structure being connected to the first input of a module closest to the root of the tree structure.
-
-
3. A prefix computation circuit comprising:
-
a plurality of input terminals to receive input data, a circuit configuration to perform prefix computation based on said input data, and a plurality of output terminals to provide output data based on said prefix computation, wherein the prefix computation is cyclic and segmented. - View Dependent Claims (4, 5, 6)
-
-
7. A prefix computation circuit comprising:
-
a plurality of N input terminals to receive input data, a circuit configuration to perform prefix computation based on said input data, and a plurality of output terminals to provide output data based on said prefix computation, wherein the prefix computation is cyclic and segmented, and wherein the longest path through the circuit configuration is O (log N) circuit elements.
-
-
8. A prefix computation circuit comprising:
-
a plurality of input terminals to receive input data, a circuit configuration to perform prefix computation based on said input data, and a plurality of output terminals to provide output data based on said prefix computation, wherein the prefix computation is cyclic and segmented, and wherein the circuit configuration computes a cyclic segmented prefix using a plurality of modules in which at least one module is connected to each of a plurality of other modules via a child connection comprising an output connected to an input of a child module, a first input connected to a first output of the same child module, and a second input connected to a second output of the same child module, and the child connections are ordered so that each child connection has a predecessor child connection whereby if the second input from a child connection'"'"'s predecessor is true, then the output of the child connection is an identity value, and if the second input from a child connection'"'"'s predecessor is false, then the output of the child connection is the same value as a combination, by an associative function, of the first input of the predecessor child connection and the output of the predecessor child connection.
-
-
9. A circuit comprising:
-
a plurality of child modules and at least one cyclic prefix module, having a plurality of child connections, each child connection comprising;
an output connected to an input of a child module, a first input connected to a first output of the same child module, and a second input connected to a second output of the same child module, the child connections being ordered so that each child connection has a predecessor child connection;
the cyclic prefix module further comprising a circuit which computes a cyclic segmented prefix, whereby if the second input from a child connection'"'"'s predecessor is true, then the output of the child connection is an identity value, and if the second input from a child connection'"'"'s predecessor is false, then the output of the child connection is the same value as a combination, by an associative function, of the first input of the predecessor child connection and the output of the predecessor child connection. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A circuit comprising:
-
a plurality of leaf modules, each leaf module comprising;
an input which encodes an input value, a first output which encodes an output value, and a second output which encodes a boolean value;
a cyclic prefix module, having a plurality of leaf connections, each leaf connection comprising;
an output connected to the input of a leaf module, a first input connected to the first output of the same leaf module, and a second input connected to the second output of the same leaf module, there being a total of N leaf connections, each leaf connection being identified by a different number from 1 to N and for each leaf connection numbered I, if I is less than N the leaf connection being called the predecessor of leaf connection number I+1, and if I equals N the leaf connection being called the predecessor of leaf connection number 1;
the cyclic prefix module further comprising a circuit which computes a cyclic segmented prefix, whereby if the second input from a leaf connection'"'"'s predecessor is true, then the output of the leaf connection is an identity value, and if the second input from a leaf connection'"'"'s predecessor is false, then the output of the leaf connection is the same value as a combination, by an associative function, of the first input of the leaf connection'"'"'s predecessor and the output of the leaf connection'"'"'s predecessor. - View Dependent Claims (16, 17, 18)
-
Specification