Vertically and horizontally threaded processor with multidimensional storage for storing thread data
First Claim
Patent Images
1. A processor comprising:
- a plurality of processing units, a processing unit respectively allocated to an execution thread of a plurality of execution threads extending across the plurality of processing units in a horizontal multithreading arrangement, the individual processing units having an associated pipeline that is shared among a plurality of threads in a vertical multithreading arrangement; and
a multi-dimensional storage coupled to the plurality of processing units and including a plurality of storage structures that are replicated for the respective plurality of processing units for storing horizontal thread data, the storage structures being a three-dimensional storage arranged to store vertical thread data, the three-dimensional storage being formed as a plurality of two-dimensional storage planes.
2 Assignments
0 Petitions
Accused Products
Abstract
A processor includes a “four-dimensional” register structure in which register file structures are replicated by N for vertical threading in combination with a three-dimensional storage circuit. The multi-dimensional storage is formed by constructing a storage, such as a register file or memory, as a plurality of two-dimensional storage planes.
109 Citations
25 Claims
-
1. A processor comprising:
-
a plurality of processing units, a processing unit respectively allocated to an execution thread of a plurality of execution threads extending across the plurality of processing units in a horizontal multithreading arrangement, the individual processing units having an associated pipeline that is shared among a plurality of threads in a vertical multithreading arrangement; and
a multi-dimensional storage coupled to the plurality of processing units and including a plurality of storage structures that are replicated for the respective plurality of processing units for storing horizontal thread data, the storage structures being a three-dimensional storage arranged to store vertical thread data, the three-dimensional storage being formed as a plurality of two-dimensional storage planes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
one or more of the plurality of processing units includes a multiple-vertical-thread execution pipeline including a plurality of pipeline states by usage of multiple-bit flip-flops, the multiple thread execution pipeline being coupled to the multi-dimensional storage that is shared among the plurality of execution threads of the individual processing unit.
-
-
3. A processor according to claim 1 wherein:
the multi-dimensional storage includes a plurality of non-overlapping two-dimensional planar windows containing storage cells that are connected to address lines addressing cells in a layer in two dimensions, an individual plane representing a window of the plurality of windows, the windows being non-overlapping.
-
4. A processor according to claim 3 further comprising:
a window pointer, the multi-dimensional storage including the plurality of non-overlapping windows, a window representing a context, context switching being performed by changing the window pointer representing a context number.
-
5. A processor according to claim 1 further comprising:
-
a plurality of address lines for addressing the multi-dimensional storage, a first and second set of address lines addressing the two-dimensional storage planes and shared among the plurality of two-dimensional storage planes in the three-dimensional storage; and
a pointer selecting a two-dimensional storage plane from among the plurality of planes in the three-dimensional storage.
-
-
6. A processor according to claim 1 further comprising:
-
a plurality of bit cells forming two-dimensional register windows of the multi-dimensional storage distributed in a planar surface of an integrated circuit;
a plurality of the two-dimensional register windows at a plurality of depths in the integrated circuit; and
a plurality of address lines including lines i for selecting bits of a register j, and lines j+k for selecting registers j of a window k, the number of address lines being i times (j+k).
-
-
7. A processor according to claim 6 wherein:
the address lines are single-ended address lines.
-
8. A processor according to claim 6 wherein:
the address lines are double-ended address lines.
-
9. A processor according to claim 1 wherein:
the multi-dimensional storage stores data for a plurality of threads, the threads corresponding to respective ones of the processing units.
-
10. A processor according to claim 1 further comprising:
a plurality of load/store units coupled to the plurality of processing units and respectively allocated for loading and storage data for the plurality of execution threads.
-
11. A processor according to claim 1 further comprising:
an external cache control unit coupled to the plurality of load/store units and shared among the plurality of execution threads.
-
12. A processor comprising:
-
a plurality of processing units in a single integrated circuit, an individual processing unit of the plurality of processing units including;
a multiple-thread execution pipeline including a plurality of pipelines respectively allocated to a plurality of execution threads in a vertical multithreading arrangement; and
a multi-dimensional storage coupled to the multiple-thread execution pipeline and including a plurality of storage structures that are replicated for the respective plurality of pipelines for storing horizontal thread data, the storage structures being a three-dimensional storage arranged to store vertical thread data, the three-dimensional storage being formed as a plurality of two-dimensional storage planes. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
a multiple-vertical-thread execution pipeline including a plurality of pipeline states by usage of multiple-bit flip-flops, the multiple thread execution pipeline being coupled to the multi-dimensional storage that is shared among the plurality of execution threads of the individual processing unit.
-
-
14. A processor according to claim 12 further comprising:
a multiple-vertical-thread execution pipeline including a plurality of pipeline states by usage of multiple-bit flip-flops, the multiple thread execution pipeline being coupled to the multi-dimensional storage that is shared among the plurality of execution threads of the individual processing unit.
-
15. A processor according to claim 12 further comprising:
a multi-dimensional storage shared among the plurality of execution threads of the processing units, the multi-dimensional storage including a plurality of non-overlapping two-dimensional planar windows containing storage cells that are connected to address lines addressing cells in a layer in two dimensions, an individual plane representing a window of the plurality of windows, the windows being non-overlapping.
-
16. A processor according to claim 12 further comprising:
a window pointer coupled to the multi-dimensional storage for addressing storage cells of the multi-dimensional storage, the multi-dimensional storage including the plurality of non-overlapping windows, a window representing a context, context switching being performed by changing the window pointer representing a context number.
-
17. A processor according to claim 12 further comprising:
-
a plurality of address lines coupled to the coupled to the multi-dimensional storage for addressing the multi-dimensional storage, a first and second set of address lines addressing the two-dimensional storage planes and shared among the plurality of two-dimensional storage planes in the three-dimensional storage; and
a pointer coupled to the coupled to the multi-dimensional storage for selecting a two-dimensional storage plane from among the plurality of planes in the three-dimensional storage.
-
-
18. A processor according to claim 12 wherein the individual processing units of the plurality of processing units include a multi-dimensional storage comprising:
-
a plurality of bit cells forming two-dimensional register windows of the multi-dimensional storage distributed in a planar surface of an integrated circuit;
a plurality of the two-dimensional register windows at a plurality of depths in the integrated circuit; and
a plurality of address lines including lines i for selecting bits of a register j, and lines j+k for selecting registers j of a window k, the number of address lines being i times (j+k).
-
-
19. A processor according to claim 18 wherein:
the address lines are single-ended address lines.
-
20. A processor according to claim 18 wherein:
the address lines are double-ended address lines.
-
21. A processor according to claim 12 wherein:
the multi-dimensional storage is shared among the plurality of execution threads of the plurality of processing units, the multi-dimensional storage storing data for a plurality of threads in a horizontal threading arrangement, the horizontal threads corresponding to respective ones of the plurality of processing units.
-
22. A processor according to claim 12 wherein the individual processing units of the plurality of processing units further include:
a plurality of load/store units coupled to the multiple-thread execution pipeline and respectively allocated for loading and storage data for the plurality of execution threads.
-
23. A processor according to claim 12 wherein the individual processing units of the plurality of processing units further include:
an external cache control unit coupled to the plurality of load/store units and shared among the plurality of execution threads.
-
24. A method of operating a processor comprising:
-
executing multiple execution threads in a plurality of processing units in a single integrated circuit; and
within an individual processing unit of the plurality of processing units;
executing a plurality of execution threads among the plurality of processing units in a horizontal threading arrangement;
executing a plurality of execution threads in a multiple-thread execution pipeline in a plurality of pipelines of a processing unit in one or more of the processing units; and
storing data in a multi-dimensional storage coupled to the multiple-thread execution pipeline, the multi-dimensional storage including a plurality of storage structures that are replicated for the plurality of processing units and arranged to store horizontal thread data, the plurality of storage structures having a three-dimensional storage and arranged to store vertical thread data, the three-dimensional storage being formed as a plurality of two-dimensional storage planes.
-
-
25. A processor comprising:
-
means for executing multiple execution threads in a plurality of processing units in a single integrated circuit; and
within an individual processing unit of the plurality of processing units;
means for executing a plurality of execution threads among the plurality of processing units in a horizontal threading arrangement;
means for executing a plurality of execution threads in a multiple-thread execution pipeline in a plurality of pipelines of a processing unit in one or more of the processing units; and
means for storing data in a multi-dimensional storage coupled to the multiple-thread execution pipeline, the multi-dimensional storage including a plurality of storage structures that are replicated for the plurality of processing units and arranged to store horizontal thread data, the plurality of storage structures having a three-dimensional storage and arranged to store vertical thread data, the three-dimensional storage being formed as a plurality of two-dimensional storage planes.
-
Specification