Multiple-thread processor with single-thread interface shared among threads
First Claim
1. A processor comprising:
- a multiple-thread execution pipeline including a plurality of pipelines respectively allocated to a plurality of execution threads, respective ones of the plurality of pipelines to execute the allocated execution threads in a first thread dimension, wherein at least one of the plurality of pipelines is to execute more than one of the plurality of execution threads in a second thread dimension, and wherein the multiple-thread execution pipeline includes storage elements for holding the plurality of threads;
a plurality of shared components coupled to the multiple-thread execution pipeline, the shared components being coupled in a sequence so that the plurality of pipelines converge into the sequence of shared components, the shared components being logic components that control but do not hold threads;
a cache control unit coupled to the multiple-thread execution pipeline;
an L1 cache coupled to the cache control unit; and
anti-aliasing logic coupled to the L1 cache so that the L1 cache is shared among threads via anti-aliasing.
0 Assignments
0 Petitions
Accused Products
Abstract
A processor includes logic for tagging a thread identifier (TID) for usage with processor blocks that are not stalled. Pertinent non-stalling blocks include caches, translation look-aside buffers (TLB), a load buffer asynchronous interface, an external memory management unit (MMU) interface, and others. A processor includes a cache that is segregated into a plurality of N cache parts. Cache segregation avoids interference, “pollution”, or “cross-talk” between threads. One technique for cache segregation utilizes logic for storing and communicating thread identification (TID) bits. The cache utilizes cache indexing logic. For example, the TID bits can be inserted at the most significant bits of the cache index.
93 Citations
24 Claims
-
1. A processor comprising:
-
a multiple-thread execution pipeline including a plurality of pipelines respectively allocated to a plurality of execution threads, respective ones of the plurality of pipelines to execute the allocated execution threads in a first thread dimension, wherein at least one of the plurality of pipelines is to execute more than one of the plurality of execution threads in a second thread dimension, and wherein the multiple-thread execution pipeline includes storage elements for holding the plurality of threads;
a plurality of shared components coupled to the multiple-thread execution pipeline, the shared components being coupled in a sequence so that the plurality of pipelines converge into the sequence of shared components, the shared components being logic components that control but do not hold threads;
a cache control unit coupled to the multiple-thread execution pipeline;
an L1 cache coupled to the cache control unit; and
anti-aliasing logic coupled to the L1 cache so that the L1 cache is shared among threads via anti-aliasing. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
the multiple-thread execution pipeline includes a plurality of pulse-based high-speed flip-flops, the pulse-based high-speed flip-flops having a latch structure coupled to a plurality of select-bus lines, the select-bus lines selecting an active thread from among the plurality of execution threads.
-
-
3. A processor according to claim 1 wherein:
the plurality of shared components are selected from among components including a memory management unit (MMU), a branch prediction unit, a next-fetch random access memory (RAM).
-
4. A processor according to claim 1 wherein:
-
the L1 cache is a virtually-indexed, physically-tagged cache that us shared among thread; and
the anti-aliasing logic avoids hazards that result from multiple virtual addresses mapping to one physical address, the anti-aliasing logic selectively invalidating or updating duplicate L1 cache entries.
-
-
5. A processor according to claim 1 wherein:
-
the L1 cache is a virtually-indexed, physically-tagged cache that is shared among threads; and
the anti-aliasing logic includes logic supporting lightweight processes and native threads that disables thread ID tagging and disables cache segregation.
-
-
6. A processor according to claim 1 wherein:
the anti-aliasing logic selectively invalidates or updates duplicate L1 cache entries to avoid hazards that result from multiple virtual addresses mapping to one physical address.
-
7. A processor according to claim 1 wherein:
the L1 cache includes cache indexing logic, with the cache control unit segregating the L1 cache by separating the L1 cache into N independent parts that are allocated to threads to avoid pollution, “
cross-talk”
, and interface between threads.
-
8. A processor according to claim 1 further comprising:
a plurality of multiple-thread execution pipelines and the shared components integrated onto a single integrated-circuit chip.
-
9. A processor according to claim 1 further comprising:
a single-pathway component coupled to the multiple-thread execution pathways so that the plurality of execution pathways converge into the single-pathway of the single-pathway component, the single-pathway component being a non-stalling component.
-
10. A processor according to claim 1 further comprising:
a non-stalling component coupled to the multiple-thread execution pathways so that the plurality of execution pathways converge into a single-pathway including the non-stalling component.
-
11. A processor according to claim 1 further comprising:
a plurality of multiple-thread execution pipelines and a single-thread interface integrated onto a single integrated-circuit chip.
-
12. A processor according to claim 1 further comprising:
a single-thread interface including a load buffer and a store buffer that maintain compatibility with multiple threads by checking read-after-write status of the load buffer and the store buffer.
-
13. A method of operating a processor comprising:
-
executing a plurality of instruction threads in a corresponding plurality of execution pipelines in a first thread dimension;
alternately executing and storing a plurality of instruction threads in ones of the plurality of execution pipelines in a second dimension including;
executing one thread of the second thread dimension plurality of instruction threads;
storing one or more other threads of the second thread dimension plurality of threads; and
alternating the second thread dimension plurality of instruction threads between the executing and storing acts;
converging the plurality of threads in the first thread dimension and the second thread dimension to a plurality of shared components;
sharing the plurality of shared components among the plurality of threads in the first thread dimension and the second thread dimension;
caching execution data; and
anti-aliasing the cached data by invalidating or updating duplicate cache entries. - View Dependent Claims (14, 15, 16, 17, 18)
controlling the converged threads in a plurality of shared components without storing.
-
-
15. A method according to claim 13 further comprising:
-
maintaining thread compatibility by;
physically duplicating structures; and
verifying communication status after thread transfer.
-
-
16. A method according to claim 13 further comprising:
segregating the cache into N parts to maintain thread compatibility.
-
17. A method according to claim 13 further comprising:
-
tagging identity (ID) of threads; and
segregating the cache into N parts to maintain thread compatibility.
-
-
18. A method according to claim 13 further comprising:
-
tagging identity (ID) of threads;
segregating the cache into N parts to maintain thread compatibility;
detecting lightweight processes and native threads, in response the detection;
disabling thread ID tagging; and
disabling cache segregation.
-
-
19. A processor comprising:
-
means for executing a plurality of instruction threads in a corresponding plurality of execution pipelines in a first thread dimension;
means for alternatively executing and storing a plurality of instruction threads in ones of the plurality of execution pipelines in a second thread dimension including;
means for executing one thread of the second thread dimension plurality of instruction threads;
means for storing one or more other threads of the second thread dimension plurality of threads; and
means for alternating the second thread dimension plurality of instruction threads between the executing and storing acts;
means for converging the plurality of threads in the first thread dimension and the second thread dimension to a plurality of shared components;
means for sharing the plurality of shared components among the plurality of threads in the first thread dimension and the second thread dimension;
means for caching execution data; and
means for anti-aliasing the cached data by invalidating or updating duplicate cache entries. - View Dependent Claims (20, 21, 22, 23, 24)
means for controlling the converged threads in a plurality of shared components without storing.
-
-
21. A processor according to claim 19 further comprising:
means for maintaining thread compatibility.
-
22. A processor according to claim 19 further comprising:
means for segregating the cache into N parts to maintain thread compatibility.
-
23. A processor according to claim 19 further comprising:
-
means for tagging identity (ID) of threads; and
means for segregating the cache into N parts to maintain thread compatibility.
-
-
24. A processor according to claim 19 further comprising:
-
means for tagging identity (ID) of threads;
means for segregating the cache into N parts to maintain thread compatibility;
means for detecting lightweight processes and native threads; and
means for, in response the detection, disabling thread ID tagging; and
disabling cache segregation.
-
Specification