Adaptive Contention-Aware Thread Placement for Parallel Runtime Systems
First Claim
1. A method, comprising:
- performing, by a computer that includes multiple processor sockets, each of which includes one or more processor cores;
receiving an application that is configured for parallel execution on the computer;
determining, dependent on profile data that characterizes the behavior of the computer when multiple applications are executed in parallel on a single one of the processor sockets, that the application is to be executed on a given one of the multiple processor sockets while a particular other application is also executing on the given one of the multiple processor sockets;
beginning execution of the given application, wherein execution of the given application comprises executing program instructions that perform work on behalf of the given application and that cause a respective value of each of one or more performance counters in one or more processor cores of the given one of the multiple processor sockets on which respective software threads of the given application are executing to be updated; and
determining, prior to completing execution of the given application or the particular other application, and dependent on the updated values of the one or more performance counters, that execution of the given application or execution of the particular other application is to continue on a different one of the multiple processor sockets.
1 Assignment
0 Petitions
Accused Products
Abstract
An adaptive contention-aware thread scheduler may place software threads for pairs of application on the same socket of a multi-socket machine for execution in parallel. Initial placements may be based on profile data that characterizes the machine and its behavior when multiple applications execute on the same socket. The profile data may be collected during execution of other applications. It may identify performance counters within the cores of the processor sockets whose values are suitable for use in predicting whether the performance of a pair of applications will suffer when they are executed together on the same socket (e.g., values indicative of their demands for particular shared resources). During execution, the scheduler may examine the performance counters (or performance metrics derived therefrom) and determine different placement decisions (e.g., placing an application with high demand for resources of one type together with an application with low demand for those resources).
-
Citations
20 Claims
-
1. A method, comprising:
performing, by a computer that includes multiple processor sockets, each of which includes one or more processor cores; receiving an application that is configured for parallel execution on the computer; determining, dependent on profile data that characterizes the behavior of the computer when multiple applications are executed in parallel on a single one of the processor sockets, that the application is to be executed on a given one of the multiple processor sockets while a particular other application is also executing on the given one of the multiple processor sockets; beginning execution of the given application, wherein execution of the given application comprises executing program instructions that perform work on behalf of the given application and that cause a respective value of each of one or more performance counters in one or more processor cores of the given one of the multiple processor sockets on which respective software threads of the given application are executing to be updated; and determining, prior to completing execution of the given application or the particular other application, and dependent on the updated values of the one or more performance counters, that execution of the given application or execution of the particular other application is to continue on a different one of the multiple processor sockets. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
14. A system, comprising:
-
a plurality of collections of processor cores, each of which includes multiple processor cores, wherein each of the multiple processor cores includes a plurality of performance counters, and wherein the multiple processor cores in each collection of processor cores share at least one hardware resource; a memory comprising; program instructions that when executed on one or more of the multiple processor cores in a given one of the collections of processor cores cause the one or more processor cores to implement a given one of the multiple applications; and additional program instructions that when executed on one or more other ones of the multiple processor cores in the given one of the collections of processor cores cause the one or more other processor cores to implement another one of the multiple applications; wherein, when executed by a worker thread of the given application, a portion of the program instructions perform work on behalf of the given application and cause a respective value of each of one or more of the plurality of performance counters of the one or more processor cores in the given one of the collections of processor cores to be updated; wherein, when executed by a worker thread of the other application, a portion of the additional program instructions perform work on behalf of the other application and cause a respective value of each of one or more of the plurality of performance counters of the one or more other processor cores in the given one of the collections of processor cores to be updated; and wherein a scheduler thread of the given application is configured to; collect the updated values of the one or more of the plurality of performance counters of the one or more processor cores and the one or more other processor cores in the given one of the collections of processor cores; and determine, prior to completing execution of the given application, and dependent on the collected updated values of the performance counters, that execution of the given application or execution of the other application is to continue on one or more processor cores in a different one of the plurality of collections of processor cores. - View Dependent Claims (15, 16, 17)
-
-
18. A non-transitory, computer-readable storage medium storing program instructions that when executed on a multi-socket computer cause the multi-socket computer to implement an adaptive contention-aware thread scheduler;
wherein the adaptive contention-aware thread scheduler is configured to; collect, during execution of two applications on two or more processor cores of a given processor socket of the multi-socket computer, values of one or more performance counters in each of the two or more processor cores of the given processor socket, wherein the values of the one or more performance counters in each of the two or more processor cores indicate the extent to which the two applications compete for a resource of a given type on the given processor socket that is shared by the two applications; determine, prior to completing execution of the two applications, and dependent on the collected values, that demand for the shared resource by both of the two applications is high; and select, in response to determining that demand for the shared resource by both of the two applications is high, a different one of the processor sockets of the multi-socket computer on which to continue execution of one of the two applications, wherein to select a different one of the processor sockets, the adaptive contention-aware thread scheduler is configured to identify a processor socket on which an application with a low demand for resources of the given type is executing. - View Dependent Claims (19, 20)
Specification