Macroscalar Processor Architecture
First Claim
1. A method for aggregating a program loop, the method comprising:
- receiving by the processor instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, wherein the processor includes a plurality of slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel; and
for each iteration of the program loop, executing an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel.
0 Assignments
0 Petitions
Accused Products
Abstract
A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.
293 Citations
54 Claims
-
1. A method for aggregating a program loop, the method comprising:
-
receiving by the processor instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, wherein the processor includes a plurality of slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel; and for each iteration of the program loop, executing an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. - View Dependent Claims (2, 3, 4)
-
-
5. A method for aggregating a program loop, the method comprising:
-
determining whether a number of dynamic registers of a processor to be used in a program loop exceeds a predetermined threshold, the processor including a plurality of static registers, dynamic registers, and extended registers; and allocating one or more of the extended registers to be used in the program loop in place of at least a portion of the determined number of dynamic registers intended to be used in the program loop, such that a number of dynamic registers used in aggregating the program loop does not exceed the predetermined threshold. - View Dependent Claims (6, 7, 8, 9)
-
-
10. A method for aggregating a program loop, the method comprising:
-
identifying a static register of a processor to be accessed by an instruction of an iteration of a program loop, wherein a value of the static register is used in a subsequent iteration; and in response to the instruction executed by the processor, copying the value of the static register to a dynamic register of the processor, the dynamic register being used in the subsequent iteration of the program loop, wherein copying the value of the static register is performed without additional instructions from source code corresponding to the instruction of the iteration received by the processor. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A method for aggregating a program loop, the method comprising:
-
identifying instructions of a sequence block of a program loop, the sequence block including a first conditional branch code and a second conditional branch code dependent upon a predicate condition, wherein the first conditional branch code is longer than the second conditional branch code; and modifying the instructions such that when instructions of the second branch code that satisfy the predicate condition have been executed during a current iteration, execution of a next iteration starts without examining a remainder of the first branch code that has not been executed in the current iteration. - View Dependent Claims (16, 17)
-
-
18. A method for aggregating a program loop, the method comprising:
-
receiving a first instruction to determine a first predicate based on a conditional test and a second instruction to conditionally set a second predicate based on a condition of the first predicate; and converting the first and second instructions into a single instruction, which when executed by a processor, sets the second predicate based on the conditional test only if a result of the conditional test satisfies a predetermined criteria, wherein the single instruction is executed by the processor without using a register to store the condition of the first predicate. - View Dependent Claims (19, 20, 21)
-
-
22. A method for aggregating a program loop, the method comprising:
-
identifying instructions of a program loop including a first set of one or more instructions to be executed when a predicate condition is satisfied and a second set of one or more instructions to be executed when the predicate condition is not satisfied; executing the first set of the instructions as a first vector block for one or more consecutive iterations during which the predicate condition is satisfied; and executing the second set of the instructions as a second vector block for one or more consecutive iterations during which the predicate condition is not satisfied. - View Dependent Claims (23, 24)
-
-
25. A method for aggregating a program loop, the method comprising:
-
identifying one or more instructions of a program loop dependent upon a predicate condition based on a first variable of a previous iteration and a second variable of a current iteration; for all iterations of the program loop, recursively shifting a value of a static register associated with the first variable to an extended register associated with the static register of the first variable for a current iteration and substantially simultaneously shifting a value of a dynamic register associated with the second variable for the current iteration to an extended register associated with the dynamic register of the second variable for a next iteration; and executing the one or more instructions of the program loop as a vector block using the shifted extended registers for the first and second variables. - View Dependent Claims (26, 27)
-
-
28. A method for aggregating a program loop, the method comprising:
-
identifying one or more instructions of a program loop having a branch instruction that causes the program loop to branch dependent upon a predicate condition after a memory write operation; and modifying at least one of the one or more instructions to cause a processor for executing the one or more instructions to branch after the memory write operation executed as a vector block for iterations prior to and including an iteration during which the predicate condition is satisfied. - View Dependent Claims (29, 30)
-
-
31. A method for aggregating a program loop, the method comprising:
-
identifying one or more instructions of a program loop having a first loop having a first set of instructions including a second loop, the second loop having a second set of instructions to be executed as a sequence block; and modifying at least one of the instructions such that for each iteration of executing the second set of instructions of the second loop, executing the first set of instructions of the first loop as a vector block for all iterations of the program loop. - View Dependent Claims (32, 33, 34, 35)
-
-
36. A method for aggregating a program loop, comprising:
-
identifying one or more instructions of a program loop, wherein during each iteration of the program loop, a member of a first array referenced by a first index is assigned to a member of a second array referenced by a second index dependent upon a predicate condition; for each iteration of the program loop, at runtime by a processor, incrementally assigning a value to a dynamic register associated with the respective iteration only if the predicate condition is satisfied during the respective iteration; and executing the one or more instructions as a vector block that assign a member of the first array to a member of the second array referenced by the dynamic register associated with each iteration. - View Dependent Claims (37, 38, 39)
-
-
40. A method for aggregating a program loop, the method comprising:
-
identifying at runtime one or more instructions of a program loop, the one or more instructions being executed depending on one or more predicate conditions; and dispatching one of the instructions to one or more functional units of a processor if the one or more predicate conditions are satisfied while a remainder of the instructions is not dispatched. - View Dependent Claims (41, 42)
-
-
43. A machine implemented method, comprising:
-
receiving a data stream of a program executed within a processor, the data stream including a plurality of virtual memory pages each having virtual memory addresses and identified by a virtual memory page number; and for each virtual memory page, transmitting physical addresses corresponding to the virtual memory addresses and the virtual memory page number associated with respective virtual memory page to a controller to enable the controller to prefetch at least a portion of a remainder of the corresponding physical memory page based on a physical address and the virtual memory page number of the corresponding physical memory page. - View Dependent Claims (44, 45, 46)
-
-
47. A machine implemented method, comprising:
-
in response to a data stream of a program executed within a processor, the data stream including a plurality of physical memory pages each having physical memory addresses and each of the physical memory pages corresponding to a virtual memory page identified by a virtual memory page identifier, for each physical page, determining whether a predetermined physical address of a respective physical page is associated with a predetermined virtual memory page identifier; and prefetching at least a portion of a remainder of the respective physical page if the predetermined physical address of the respective physical page is associated with the predetermined virtual memory page identifier. - View Dependent Claims (48, 49, 50, 51)
-
-
52. An apparatus for aggregating a program loop, the apparatus comprising:
-
means for receiving by the processor instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, wherein the processor includes a plurality of slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel; and for each iteration of the program loop, means for executing an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel.
-
-
53. An apparatus for aggregating a program loop, the apparatus comprising:
-
means for identifying a static register of a processor to be accessed by an instruction of an iteration of a program loop, wherein a value of the static register is used in a subsequent iteration; and in response to the instruction executed by the processor, means for copying the value of the static register to a dynamic register of the processor, the dynamic register being used in the subsequent iteration of the program loop, wherein copying the value of the static register is performed without additional instructions from source code corresponding to the instruction of the iteration received by the processor.
-
-
54. An apparatus for aggregating a program loop, the apparatus comprising:
-
means for identifying instructions of a program loop including a first set of one or more instructions to be executed when a predicate condition is satisfied and a second set of one or more instructions to be executed when the predicate condition is not satisfied; means for executing the first set of the instructions as a first vector block for one or more consecutive iterations during which the predicate condition is satisfied; and means for executing the second set of the instructions as a second vector block for one or more consecutive iterations during which the predicate condition is not satisfied.
-
Specification