Thread switch logic in a multiple-thread processor
First Claim
1. A processor comprising:
- a multiple-thread execution pipeline including a plurality of pipelines respectively allocated to a plurality of execution threads; and
a thread switch logic coupled to the multiple-thread execution pipeline, the thread switch logic that switches execution threads according to a thread switching mode selected from among a plurality of thread switching modes.
2 Assignments
0 Petitions
Accused Products
Abstract
A processor includes a thread switching control logic that performs a fast thread-switching operation in response to an L1 cache miss stall. The fast thread-switching operation implements one or more of several thread-switching methods. A first thread-switching operation is “oblivious” thread-switching for every N cycle in which the individual flip-flops locally determine a thread-switch without notification of stalling. The oblivious technique avoids usage of an extra global interconnection between threads for thread selection. A second thread-switching operation is “semi-oblivious” thread-switching for use with an existing “pipeline stall” signal (if any). The pipeline stall signal operates in two capacities, first as a notification of a pipeline stall, and second as a thread select signal between threads so that, again, usage of an extra global interconnection between threads for thread selection is avoided. A third thread-switching operation is an “intelligent global scheduler” thread-switching in which a thread switch decision is based on a plurality of signals including: (1) an L1 data cache miss stall signal, (2) an instruction buffer empty signal, (3) an L2 cache miss signal, (4) a thread priority signal, (5) a thread timer signal, (6) an interrupt signal, or other sources of triggering. In some embodiments, the thread select signal is broadcast as fast as possible, similar to a clock tree distribution. In some systems, a processor derives a thread select signal that is applied to the flip-flops by overloading a scan enable (SE) signal of a scannable flip-flop.
-
Citations
65 Claims
-
1. A processor comprising:
-
a multiple-thread execution pipeline including a plurality of pipelines respectively allocated to a plurality of execution threads; and
a thread switch logic coupled to the multiple-thread execution pipeline, the thread switch logic that switches execution threads according to a thread switching mode selected from among a plurality of thread switching modes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
the multiple-thread execution pipeline includes a plurality of flip-flops; and
the thread switch logic selects a thread switching mode from among a plurality of modes including an oblivious mode, the oblivious mode switching threads for every N cycle in which individual flip-flops in the multiple-thread execution pipeline locally determine a thread switch without notification of stalling, N being a selected number of cycles.
-
-
3. A processor according to claim 2 wherein:
the thread switch logic includes a counter for counting cycles between thread switches.
-
4. A processor according to claim 1 further comprising:
-
a load/store unit coupled to the multiple-thread execution pipeline, wherein;
the thread switch logic selects a thread switching mode from among a plurality of modes including a semi-oblivious mode, the semi-oblivious mode switching threads on a load-use stall signaled from the load/store unit indicative of a load/store unit global stall condition.
-
-
5. A processor according to claim 1 wherein:
the thread switch logic selects a thread switching mode from among a plurality of modes including a semi-oblivious mode, the semi-oblivious mode switching threads on a pipeline stall signaled from multiple-thread execution pipeline indicative of stalling of the pipeline.
-
6. A processor according to claim 1 wherein:
the thread switch logic selects a thread switching mode from among a plurality of thread switching modes including an intelligent global scheduler thread switching mode in which a thread switch decision is selectively programmed based on one or more signals.
-
7. A processor according to claim 6 wherein:
the thread switch decision is selectively programmed based on signals including an L1 cache miss stall signal and an L1 cache load miss signal.
-
8. A processor according to claim 6 wherein:
the thread switch decision is selectively programmed based on signals including an L1 cache miss stall signal and an L1 cache load miss signal.
-
9. A processor according to claim 6 wherein:
the thread switch decision is selectively programmed based on signals including an instruction buffer empty signal supplied by the processor.
-
10. A processor according to claim 6 wherein:
the thread switch decision is selectively programmed based on signals including a thread priority signal supplied by the processor.
-
11. A processor according to claim 6 wherein:
the thread switch decision is selectively programmed based on signals including a thread priority signal supplied by the processor, the thread select signal being broadcast as fast as possible as determined by a clock tree distribution.
-
12. A processor according to claim 6 wherein:
the processor derives a thread select signal that is applied to the flip-flops by overloading a scan enable (SE) signal of a scannable flip-flop.
-
13. A processor according to claim 1 wherein:
the multiple-thread execution pipeline includes a plurality of pulse-based highs-speed flip-flops, the pulse-based high-speed flip-flops having a latch structure coupled to a plurality of select-bus lines, the select-bus lines selecting an active thread from among the plurality of execution threads.
-
14. A processor according to claim 1 further comprising:
-
a single-pathway component coupled to the multiple-thread exec ution pipeline so that a plurality of execution thread pathways converge into the single-pathway of the single-pathway component, the single-pathway component being a non-stalling component, the thread switch logic controlling thread selection and generating a thread identifier (TID) indicative of the selected thread; and
a thread control logic coupled to the thread switch logic and supporting lightweight processes and native threads, the thread control logic disabling thread ID tagging and disabling cache segregation for lightweight processes and native threads that share a single virtual tag space.
-
-
15. A processor according to claim 1 further comprising:
a plurality of multiple-thread execution pipelines and the cache integrated onto a single integrated-circuit chip.
-
16. A processor according to claim 1 wherein:
the thread switch logic further includes a thread reservation or locking system that reserves a thread pathway of the multiple-thread execution pipeline for usage by a selected thread.
-
17. A processor according to claim 16 wherein:
the thread reservation or locking system institutes a priority of a particular thread among a plurality of threads.
-
18. A processor according to claim 16 wherein:
the thread reservation or locking system reserves a particular pathway of the multiple-thread execution pipeline for usage by a particular thread.
-
19. A processor according to claim 16 wherein:
the thread reservation or locking system limits the time a particular pathway of the multiple-thread execution pipeline is allocated for usage by a particular thread.
-
20. A processor comprising:
-
a multiple-thread execution pipeline including a plurality of pipelines respectively allocated to a plurality of execution threads; and
a thread switch logic coupled to the multiple-thread execution pipeline, the thread switch logic that switches execution threads according to an oblivious mode, the oblivious mode switching threads for every N cycle in which individual flip-flops in the multiple-thread execution pipeline locally determine a thread switch without notification of stalling, N being a selected number of cycles. - View Dependent Claims (21)
the thread switch logic includes a counter for counting cycles between thread switches.
-
-
22. A processor comprising:
-
a multiple-thread execution pipeline including a plurality of pipelines respectively allocated to a plurality of execution threads;
a load/store unit coupled to the multiple-thread execution pipeline; and
a thread switch logic coupled to the multiple-thread execution pipeline, the thread switch logic that switches execution threads according to a semi-oblivious mode, the semi-oblivious mode switching threads on a load-use stall signaled from the load/store unit indicative of a load/store unit global stall condition.
-
-
23. A processor comprising:
-
a multiple-thread execution pipeline including a plurality of pipelines respectively allocated to a plurality of execution threads;
a thread switch logic coupled to the multiple-thread execution pipeline, the thread switch logic that switches execution threads according to a semi-oblivious mode, the semi-oblivious mode switching threads on a pipeline stall signaled from multiple-thread execution pipeline indicative of stalling of the pipeline.
-
-
24. A processor comprising:
-
a multiple-thread execution pipeline including a plurality of pipelines respectively allocated to a plurality of execution threads; and
a thread switch logic coupled to the multiple-thread execution pipeline, the thread switch logic that switches execution threads according to an intelligent global scheduler thread switching mode in which a thread switch decision is selectively programmed based on one or more signals. - View Dependent Claims (25, 26, 27, 28, 29)
the thread switch decision is selectively programmed based on signals including an L1 cache miss stall signal and an L1 cache load miss signal.
-
-
26. A processor according to claim 24 wherein:
the thread switch decision is selectively programmed based on signals including an instruction buffer empty signal supplied by the processor.
-
27. A processor according to claim 24 wherein:
the thread switch decision is selectively programmed based on signals including a thread priority signal supplied by the processor.
-
28. A processor according to claim 24 wherein:
the thread switch decision is selectively programmed based on signals including at thread priority signal supplied by the processor, the thread select signal being broadcast as fast as possible as determined by a clock tree distribution.
-
29. A processor according to claim 24 wherein:
the processor derives a thread select signal that is applied to the flip-flops by overloading a scan enable (SE) signal of a scannable flip-flop.
-
30. A processor wherein:
-
the multiple-thread execution pipeline includes a plurality of pipelines respectively allocated to a plurality of execution threads, the multiple-thread execution pipeline including a plurality of pulse-based high-speed flip-flops, the pulse-based high-speed flip-flops having a latch structure coupled to a plurality of select-bus lines, the select-bus lines selecting an active thread from among the plurality of execution threads; and
the thread switch logic invokes a very fast exception handling functionality while executing non-threaded programs by invoking a multi-threaded-type functionality in response to an exception condition. - View Dependent Claims (31)
the thread switch logic is selectively programmed based on signals including an L1 cache miss stall signal and an L1 cache load miss signal.
-
-
32. A processor comprising:
-
a multiple-thread execution pipeline including a plurality of pipelines respectively allocated to a plurality of execution threads;
a plurality of shared components coupled to the multiple-thread execution pipeline, the shared components being coupled in a sequence so that the plurality of pipelines converge into the sequence of shared components, the shared components including a cache control unit and an L1 cache coupled to the cache control unit, the cache control unit segregating the cache by separating the cache into N independent parts that are allocated to threads to avoid pollution, “
cross-talk”
, and interference between threads;
anti-aliasing logic coupled to the L1 cache so that the L1 cache is shared among threads via anti-aliasing, the anti-aliasing logic including logic supporting lightweight processes and native threads that disables thread ID tagging and disables cache segregation; and
a thread switch logic coupled to the thread switch logic and supporting lightweight processes and native threads, the thread control logic disabling thread ID tagging and disabling cache segregation for lightweight processes and native threads that share a single virtual tag space. - View Dependent Claims (33, 34)
the L1 cache is a virtually-indexed, physically-tagged cache that is shared among threads.
-
-
34. A processor according to claim 32 wherein:
the anti-aliasing logic avoids hazards that result from multiple virtual addresses mapping to one physical address.
-
35. A processor comprising:
-
a multiple-thread execution pipeline including a plurality of pipelines respectively allocated to a plurality of execution threads;
a plurality of shared components coupled to the multiple-thread execution pipeline, the shared components being coupled in a sequence so that the plurality of pipelines converge into the sequence of shared components, the shared components including a cache control unit and an L2 cache coupled to the cache control unit, the cache control unit segregating the cache by separating the cache into N independent parts that are allocated to threads to avoid pollution, “
cross-talk”
, and interference between threads;
anti-aliasing logic coupled to the L2 cache so that the L2 cache is shared among threads via anti-aliasing, the anti-aliasing logic including logic supporting lightweight processes and native threads that disables thread ID tagging and disables cache segregation; and
a thread switch logic coupled to the thread switch logic and supporting lightweight processes and native threads, the thread control logic disabling thread ID tagging and disabling cache segregation for lightweight processes and native threads that share a single virtual tag space. - View Dependent Claims (36, 37)
the L2 cache is a virtually-indexed, physically-tagged cache that is shared among threads.
-
-
37. A processor according to claim 35 wherein:
the anti-aliasing logic avoids hazards that result from multiple virtual addresses mapping to one physical address.
-
38. A processor comprising:
-
a multiple-thread execution pipeline including, a plurality of pipelines respectively allocated to a plurality of execution threads;
a cache control unit coupled to the multiple-thread execution pipeline;
an L1 cache coupled to the cache control unit; and
anti-aliasing logic coupled to the L1 cache so that the L1 cache is shared among threads via anti-aliasing. - View Dependent Claims (39, 40, 41)
the L1 cache is a virtually-indexed, physically-tagged cache that is shared among threads; and
the anti-aliasing logic avoids hazards that result from multiple virtual addresses mapping to one physical address, the anti-aliasing logic selectively invalidating or updating duplicate L1 cache entries.
-
-
40. A processor according to claim 38 wherein:
-
the L1 cache is a virtually-indexed, physically-tagged cache that is shared among threads; and
the anti-aliasing logic includes logic supporting lightweight processes and native threads that disables thread ID tagging and disables cache segregation.
-
-
41. A processor according to claim 38 wherein:
the anti-aliasing logic avoids hazards that result from multiple virtual addresses mapping to one physical address.
-
42. A processor comprising:
-
a multiple-thread execution pipeline including a plurality of pipelines respectively allocated to a plurality of execution threads;
a cache control unit coupled to the multiple-thread execution pipeline;
an L2 cache coupled to the cache control unit; and
anti-aliasing logic coupled to the L2 cache so that the L2 cache is shared among threads via anti-aliasing. - View Dependent Claims (43, 44, 45)
the L2 cache is a virtually-indexed, physically-tagged cache that is shared among threads; and
the anti-aliasing logic avoids hazards that result from multiple virtual addresses mapping to one physical address, the anti-aliasing logic selectively invalidating or updating duplicate L2 cache entries.
-
-
44. A processor according to claim 42 wherein:
-
the L2 cache is a virtually-indexed, physically-tagged cache that is shared among threads; and
the anti-aliasing, logic includes logic supporting lightweight processes and native threads that disables thread ID tagging and disables cache segregation.
-
-
45. A processor according to claim 42 wherein:
the anti-aliasing logic avoids hazards that result from multiple virtual addresses mapping to one physical address.
-
46. A processor comprising:
-
a multiple-thread execution pipeline including a plurality of pipelines respectively allocated to a plurality of execution threads, the multiple-thread execution pipeline including a plurality of pulse-based high-speed flip-flops, the pulse-based high-speed flip-flops having a latch structure coupled to a plurality of select-bus lines, the select-bus lines selecting an active thread from among the plurality of execution threads; and
a thread switch logic coupled to the multiple-thread execution pipeline, the thread switch logic invoking a very fast except ion handling functionality while executing non-threaded programs by invoking a multithreaded-type functionality in response to an exception condition. - View Dependent Claims (47)
the thread switch logic is connected to an exception or trap signal line, an exception or trap signal evoking a switch in thread state and machine state of the processor causing the processor to shift threads of the multiple-thread execution pipeline without invoking operating system or software handling and without the inherent timing penalty of the operating system'"'"'s software saving and restoring of registers.
-
-
48. A method of operating a processor comprising:
-
executing a plurality of execution threads in a plurality of execution pathways in a multiple-thread execution pipeline;
allocating a plurality of pipelines respectively to a plurality of execution threads; and
switching execution threads according to a thread switching mode selected from among a plurality of thread switching modes. - View Dependent Claims (49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64)
including a plurality of flip-flops in the multiple-thread execution pipeline; and
switching threads in an oblivious mode for every N cycle in which individual flip-flops in the multiple-thread execution pipeline locally determine a thread switch without notification of stalling, N being a selected number of cycles.
-
-
50. A method according to claim 49 further comprising:
counting cycles between thread switches.
-
51. A method according to claim 48 further comprising:
switching threads in a semi-oblivious mode on a load-use stall signaled from the load/store unit indicative of a load/store unit global stall condition.
-
52. A method according to claim 48 further comprising:
switching threads in a semi-oblivious mode on a pipeline stall signaled from multiple-thread execution pipeline indicative of stalling of the pipeline.
-
53. A method according to claim 48 further comprising:
-
switching threads using an intelligent global scheduler thread switching mode; and
selectively programming a thread switch decision based on one or more signals.
-
-
54. A method according to claim 53 further comprising:
selectively programming the thread switch decision based on signals including an L1 cache miss stall signal and an L1 cache load miss signal.
-
55. A method according to claim 53 further comprising:
selectively programming the thread switch decision based on signals including an instruction buffer empty signal supplied by the processor.
-
56. A method according to claim 53 further comprising:
selectively programming the thread switch decision based on signals including a thread priority signal supplied by the processor.
-
57. A method according to claim 53 further comprising:
-
selectively programming the thread switch decision based on signals including a thread priority signal supplied by the processor; and
broadcasting the thread select signal as fast as possible as determined by a clock tree distribution.
-
-
58. A method according to claim 53 further comprising:
deriving a thread select signal that is applied to the flip-flops by overloading a scan enable (SE) signal of a scannable flip-flop.
-
59. A method according to claim 48 wherein:
the multiple-thread execution pipeline includes a plurality of pulse-based high-speed flip-flops, the pulse-based high-speed flip-flops having a latch structure coupled to a plurality of select-bus lines, the select-bus lines selecting an active thread from among the plurality of execution threads.
-
60. A method according to claim 48 further comprising:
-
converging a plurality of execution thread pathways into a single-pathway;
controlling thread selection;
generating a thread identifier (TID) indicative of the selected thread; and
supporting lightweight processes and native threads including;
disabling thread ID tagging; and
disabling cache segregation for lightweight processes and native threads that share a single virtual tag space.
-
-
61. A method according to claim 48 further comprising:
reserving a thread pathway of the multiple-thread execution pipeline for usage by a selected thread.
-
62. A method according to claim 61 further comprising:
instituting a priority of a particular thread among a plurality of threads.
-
63. A method according to claim 61 further comprising:
reserving a particular pathway of the multiple-thread execution pipeline for usage by a particular thread.
-
64. A method according to claim 61 further comprising:
limiting the time a particular pathway of the multiple-thread execution pipeline is allocated for usage by a particular thread.
-
65. A processor comprising:
-
means for executing a plurality of execution threads in a plurality of execution pathways in a multiple-thread execution pipeline;
means for allocating a plurality of pipelines respectively to a plurality of execution threads; and
means for switching execution threads according to a thread switching mode selected from among a plurality of thread switching modes.
-
Specification