Multiple processor, distributed memory computer with out-of-order processing
First Claim
1. A method of executing a program made up of a set of instructions and data, the instructions having a program order, on a computer system having a plurality of processor/memory units communicating on a common interconnect, the method comprising the steps of:
- (a) profiling the program to determine first portions of the program statistically more likely to be executed in a predetermined period of operation of the program than second portions of the program; and
(b) dividing the portions of the program among the processor/memory units so that the first portions are loaded into more than one processor/memory unit; and
(c) executing all the instructions on each of the processor/memory units by causing a first processor/memory unit having a portion of the set not loaded into other processor/memory units to communicate that portion over the common interconnect to the other processor/memory units without a request by the other processor/memory units for the portions.
0 Assignments
0 Petitions
Accused Products
Abstract
A distributed memory computer architecture associates separate memory blocks with their own processors, each of which executes the same program. A processor fetching data or instructions from its local memory also broadcasts that fetched data or instruction to the other processors to cut the time required for them to request this data. Runs of instruction and data local to one processor providing improved performance that is captured by the system as a whole by the ability of the other processors not executing local data or instructions to execute instructions out of order and return to find the data ready in buffer for rapid use.
91 Citations
1 Claim
-
1. A method of executing a program made up of a set of instructions and data, the instructions having a program order, on a computer system having a plurality of processor/memory units communicating on a common interconnect, the method comprising the steps of:
-
(a) profiling the program to determine first portions of the program statistically more likely to be executed in a predetermined period of operation of the program than second portions of the program; and (b) dividing the portions of the program among the processor/memory units so that the first portions are loaded into more than one processor/memory unit; and (c) executing all the instructions on each of the processor/memory units by causing a first processor/memory unit having a portion of the set not loaded into other processor/memory units to communicate that portion over the common interconnect to the other processor/memory units without a request by the other processor/memory units for the portions.
-
Specification