Multithreading in vector processors
First Claim
Patent Images
1. A system comprising:
- a processor having a vector processing mode and a multithreading mode, the processor being a vector processor and being configured to operate on one thread per cycle in the multithreading mode, and the processor comprising;
one or more program counter registers together comprising a plurality of program counters, each program counter register of the one or more program counter registers being vectorized into a corresponding subset of the plurality of program counters, and each program counter in the plurality of program counters of one or more program counter registers representing a distinct corresponding thread of a plurality of threads;
wherein the number of threads in the plurality of threads is limited by the number of program counters in the plurality of program counters of the one or more program counter registers;
the processor configured to execute the plurality of threads by activating the plurality of program counters in a round robin cycle;
an instruction buffer comprising a plurality of instructions;
wherein a first program counter in the plurality of program counters references a first instruction in the instruction buffer for execution by the processor in a first thread of the plurality of threads;
wherein a second program counter in the plurality of program counters references a second instruction in the instruction buffer for execution by the processor in a second thread of the plurality of threads, the first instruction being different from the second instruction;
wherein the processor is configured to skip, in the round robin cycle, a program counter lacking readiness based on a ready bit setting and determine a validity of each data element to be used in an operation for an active thread of the plurality of threads, said validity indicative of data being fetched from memory to be loaded into a data element, wherein responsive to a determination of an invalid operation, proceed to a next operation without incrementing the second program counter, and responsive to no invalid operation, process one or more output vectors and output a result.
2 Assignments
0 Petitions
Accused Products
Abstract
In one embodiment, a system includes a processor having a vector processing mode and a multithreading mode. The processor is configured to operate on one thread per cycle in the multithreading mode. The processor includes a program counter register having a plurality of program counters, and the program counter register is vectorized. Each program counter in the program counter register represents a distinct corresponding thread of a plurality of threads. The processor is configured to execute the plurality of threads by activating the plurality of program counters in a round robin cycle.
17 Citations
15 Claims
-
1. A system comprising:
-
a processor having a vector processing mode and a multithreading mode, the processor being a vector processor and being configured to operate on one thread per cycle in the multithreading mode, and the processor comprising; one or more program counter registers together comprising a plurality of program counters, each program counter register of the one or more program counter registers being vectorized into a corresponding subset of the plurality of program counters, and each program counter in the plurality of program counters of one or more program counter registers representing a distinct corresponding thread of a plurality of threads; wherein the number of threads in the plurality of threads is limited by the number of program counters in the plurality of program counters of the one or more program counter registers; the processor configured to execute the plurality of threads by activating the plurality of program counters in a round robin cycle; an instruction buffer comprising a plurality of instructions; wherein a first program counter in the plurality of program counters references a first instruction in the instruction buffer for execution by the processor in a first thread of the plurality of threads; wherein a second program counter in the plurality of program counters references a second instruction in the instruction buffer for execution by the processor in a second thread of the plurality of threads, the first instruction being different from the second instruction; wherein the processor is configured to skip, in the round robin cycle, a program counter lacking readiness based on a ready bit setting and determine a validity of each data element to be used in an operation for an active thread of the plurality of threads, said validity indicative of data being fetched from memory to be loaded into a data element, wherein responsive to a determination of an invalid operation, proceed to a next operation without incrementing the second program counter, and responsive to no invalid operation, process one or more output vectors and output a result. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method, comprising:
-
storing, via a vector processor having a vector processing mode and a multithreading mode and configured to operate on one thread per cycle in the multithreading mode, in one or more program counter registers together comprising a plurality of program counters; vectorizing one or more program counter registers that together comprise a plurality of program counters, into a corresponding subset of the plurality of program counters, wherein each program counter in the plurality of program counter registers represents a distinct corresponding thread of a plurality of threads; wherein the number of threads in the plurality of threads is limited by the number of program counters in the plurality of program counters of the one or more program counter registers; executing, by a processor, the plurality of threads by activating the plurality of program counters in a round robin cycle; referencing, via a first program counter in the plurality of program counters, a first instruction in an instruction buffer for execution by the processor in a first thread of the plurality of threads, wherein the instruction buffer comprises a plurality of instructions; referencing, via a second program counter in the plurality of program counters, a second instruction in an instruction buffer for execution by the processor in a second thread of the plurality of threads, the first instruction being different from the second instruction; and skipping, via the processor, in the round robin cycle, a program counter lacking readiness based on a ready bit setting and determine a validity of each data element to be used in an operation for an active thread of the plurality of threads, said validity indicative of data being fetched from memory to be loaded into a data element, the skipping comprising; responsive to a determination of an invalid operation, proceeding to a next operation without incrementing the second program counter, and responsive to no invalid operation, processing one or more output vectors and output a result. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer program product for multithreading in a vector processor, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a vector processor having a vector processing mode and a multithreading mode and configured to operate on one thread per cycle in the multithreading mode, to cause the vector processor to perform a method comprising:
-
storing, in one or more program counter registers together comprising a plurality of program counters; vectorizing one or more program counter registers that together comprise a plurality of program counters, into a corresponding subset of the plurality of program counters, wherein each program counter in the plurality of program counter registers represents a distinct corresponding thread of a plurality of threads; wherein the number of threads in the plurality of threads is limited by the number of program counters in the plurality of program counters of the one or more program counter registers; executing the plurality of threads by activating the plurality of program counters in a round robin cycle; referencing, via a first program counter in the plurality of program counters, a first instruction in an instruction buffer for execution by the processor in a first thread of the plurality of threads, wherein the instruction buffer comprises a plurality of instructions; referencing, via a second program counter in the plurality of program counters, a second instruction in an instruction buffer for execution by the processor in a second thread of the plurality of threads, the first instruction being different from the second instruction; and skipping, in the round robin cycle, a program counter lacking readiness based on a ready bit setting and determine a validity of each data element to be used in an operation for an active thread of the plurality of threads, said validity indicative of data being fetched from memory to be loaded into a data element, the skipping comprising; responsive to a determination of an invalid operation, proceeding to a next operation without incrementing the second program counter, and responsive to no invalid operation, processing one or more output vectors and output a result. - View Dependent Claims (12, 13, 14, 15)
-
Specification