Method and apparatus for strong affinity multiprocessor scheduling
First Claim
1. A method for scheduling the execution of a plurality of threads on a plurality of processors in a computer system, wherein at least one of the threads can make more than one sleep request, said method comprising the steps of associating a local queue of threads with each of the processors;
- selecting movable threads from the local queues; and
on each of the processors, performing the step of yielding control of the processor to a thread that is selected from at least the selected movable threads, wherein said step of selecting movable threads comprises identifying a busiest processor among the plurality of processors, the movable threads being selected only from eligible threads in the local queue of the busiest processor, and wherein said identifying step comprises identifying as a busiest processor a processor which has received the smallest number of sleep requests of any of the processors during a sampling period.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for scheduling threads on a multiprocessor utilize an unlocked local queue for each processor in the multiprocessor and a lockable global dispatch queue accessible by all processors. Threads are selected for movement from the unlocked local queue to the global dispatch queue only when the unlocked local queue contains too many threads that are waiting for a processor. Threads are selected to run on an available processor only after repeated checks to make certain no threads in the processor'"'"'s unlocked local queue should be run first. As a result, threads assigned to a processor tend to stay with that processor unless the system load is severely unbalanced, thereby improving system performance by increasing cache hits and decreasing lock assertions.
153 Citations
11 Claims
-
1. A method for scheduling the execution of a plurality of threads on a plurality of processors in a computer system, wherein at least one of the threads can make more than one sleep request, said method comprising the steps of associating a local queue of threads with each of the processors;
- selecting movable threads from the local queues; and
on each of the processors, performing the step of yielding control of the processor to a thread that is selected from at least the selected movable threads, wherein said step of selecting movable threads comprises identifying a busiest processor among the plurality of processors, the movable threads being selected only from eligible threads in the local queue of the busiest processor, and wherein said identifying step comprises identifying as a busiest processor a processor which has received the smallest number of sleep requests of any of the processors during a sampling period. - View Dependent Claims (2)
- selecting movable threads from the local queues; and
-
3. A method for scheduling the execution of a plurality of threads on a plurality of processors in a computer system, said method comprising the steps of associating an unlocked local queue of threads with each of the processors;
- selecting movable threads from the unlocked local queues; and
on each of the processors, performing the step of yielding control of the processor to a thread that is selected from at least the selected movable threads, wherein said step of selecting movable threads comprises identifying a busiest popular processor among the plurality of processors, a processor being popular when its unlocked local queue contains at least a predetermined number of eligible threads, the movable threads being selected only from eligible threads in the unlocked local queue of the busiest popular processor, wherein said identifying step comprises identifying as a busiest popular processor a popular processor which has received the smallest number of sleep requests of any of the popular processors during a sampling period.
- selecting movable threads from the unlocked local queues; and
-
4. A method for scheduling the execution of a plurality of threads on a plurality of processors in a computer system, said method comprising at least three control yielding steps for limiting the relative frequency of global dispatch queue accesses, including currently yielding control to an eligible thread found in a global dispatch queue, wherein during a previous yielding step control was yielded to an eligible thread found at that time in the global dispatch queue, and wherein control has been yielded at least once to a thread in a local queue by an intermediate control yielding step between said previous yielding step which previously yielded to a thread found in the global dispatch queue and said step of currently yielding control which currently yields to a thread found in the global dispatch queue.
-
5. A method for scheduling the execution of a plurality of threads on a plurality of processors in a computer system, said method comprising associating a local queue of threads with each of the processors;
- maintaining a global dispatch queue of threads which are not presently associated with any processor, and searching at least a portion of the global dispatch queue and the local queue of the processor to locate an eligible thread, wherein said searching step comprises;
checking the local queue of the processor for an eligible thread;
determining whether checking the global dispatch queue will exceed a predetermined relative frequency for global dispatch queue accesses relative to local queue accesses; and
checking the global dispatch queue only if such checking will not exceed the predetermined relative frequency for global dispatch queue accesses relative to local queue accesses and if no eligible thread is found in the local queue.
- maintaining a global dispatch queue of threads which are not presently associated with any processor, and searching at least a portion of the global dispatch queue and the local queue of the processor to locate an eligible thread, wherein said searching step comprises;
-
6. An apparatus for scheduling the execution of threads, comprising:
-
a plurality of processors for executing instructions, each of said processors being assigned to exactly one processor set, at least one of said processor sets having a plurality of assigned processors;
a shared memory for holding data;
a bus connecting said processors with said shared memory such that each of said processors can read and write locations within said shared memory;
an unlocked local queue of threads associated with each of said processors or processor sets;
a global dispatch queue of threads which are not presently associated with any of said processors;
means for selecting movable threads from said unlocked local queues, including a means for identifying a popular processor or processor set among the plurality of processors or processor sets, a processor or processor set being popular when its unlocked local queue contains at least a predetermined number of eligible threads;
on each of said processors, means for yielding control of said processor to a thread that is selected by the means for selecting movable threads from popular processors or processor sets; and
a plurality of load indictors, each of said load indicators associated with one of said processor sets for identifying a busiest processor set among said plurality of processor sets, wherein at least one of said load indicators indicates the number of sleep requests received by said associated processor set during a sampling period.
-
-
7. An apparatus for scheduling the execution of threads, comprising:
-
a plurality of processors for executing instructions, each of said processors being assigned to exactly one processor set, at least one of said processor sets having a plurality of assigned processors;
a shared memory for holding data;
a bus connecting said processors with said shared memory such that each of said processors can read and write locations within said shred memory;
an unlocked local queue of threads associated with each of said processors or processor sets;
a global dispatch queue of threads which are not presently associated with any of said processors;
means for selecting movable threads from said unlocked local queues, including a means for identifying a popular processor or processor set among the plurality of processors or processor sets, a processor or processor set being popular when its unlocked local queue contains at least a predetermined number of eligible threads;
on each of said processors, means for yielding control of said processor to a thread that is selected by the means for selecting movable threads from popular processors or processor sets;
a plurality of lockable local queues of threads, each of said lockable local queues of threads being associated with one of said processor sets; and
a means for searching in a predetermined order at least a portion of one of said lockable local queues, said global dispatch queue, and one of said unlocked local queues, to locate an eligible thread, wherein said searching means determines whether checking the global dispatch queue will exceed a predetermined relative frequency for global dispatch queue accesses, and checks said global dispatch queue only if such checking will not exceed the predetermined relative frequency for global dispatch queue accesses and if no eligible thread is found in said lockable local queue.
-
-
8. An apparatus for scheduling the execution of threads, comprising:
-
a plurality of processors for executing instructions, each of said processors being assigned to exactly one processor set, at least one of said processor sets having a plurality of assigned processors;
a shared memory for holding data;
a bus connecting said processors with said shared memory such that each of said processors can read and write locations within said shared memory;
an unlocked local queue of threads associated with each of said processors or processor sets;
a global dispatch queue of threads which are not presently associated with any of said processors;
means for selecting movable threads from said unlocked local queues, including a means for limiting the relative frequency of access to said global dispatch queue; and
on each of said processors, means for yielding control of said processor to a thread that is selected by the means for selecting movable threads while limiting the frequency of global dispatch queue accesses relative to local queue accesses. - View Dependent Claims (9)
-
-
10. A method for scheduling the execution of a plurality of threads on a plurality of processors in a computer system, said method comprising the steps of:
-
identifying a busiest processor, the busiest processor having spent a higher proportion of its processing capacity running application threads versus running an idle thread relative to a plurality of other processors in the computer system;
searching in a predetermined order at least a portion of an unlocked local queue of the busiest processor to locate an eligible thread; and
yielding control of a processor other than the busiest processor to an eligible thread found during said searching step, the eligible thread which receives control of the processor being selected only from eligible threads in the unlocked local queue of the busiest processor.
-
-
11. A computer-readable storage medium having a configuration that represents data and instructions which cause a multiprocessor to perform method steps for scheduling threads, the method comprising at least three control yielding steps for limiting the relative frequency of global dispatch queue accesses, including currently yielding control to an eligible thread found in a global dispatch queue, wherein during a previous yielding step control was yielded to an eligible thread found at that time in the global dispatch queue, and wherein control has been yielded at least once to a thread in a local queue by an intermediate control yielding step between said previous yielding step which previously yielded to a thread found in the global dispatch queue and said step of currently yielding control which currently yields to a thread found in the global dispatch queue.
Specification