Opportunistic task threading in a shared-memory, multi-processor computer system
First Claim
1. A method for parallel processing implemented by a computer having a plurality of processors including a main processor for executing a main process and at least one parallel needle processor for executing threads initiated by the main process, the computer also having a memory shared by the plurality of processors, wherein the execution time of the main process is reduced by decreasing the overhead associated with separation from the main process of a plurality of separable threads which are executed in parallel by the plurality of processors when available, the method comprising:
- (a) determining, by the main processor, if said at least one parallel needle processor is available to execute a first thread;
(b) reserving, by the main processor, the exclusive right to use the parallel needle processor responsive to the parallel needle processor being determined to be available, and executing the first thread on the main processor responsive to the parallel needle processor not being available;
(c) constructing a packaging data structure including the first thread, by the main processor, and transferring the packaging data structure for execution on the reserved needle processor responsive to the reservation being successful;
(d) creating, by the main processor, a future object in the main process while the first thread is being executed on the reserved needle processor so as to allow the main process to continue execution prior to obtaining the result;
(e) utilizing, by the main processor, the future object in the main process as if the future object were the result of the execution of the first thread;
(f) returning a result of the execution of the first thread to the memory so as to resolve the future object.
0 Assignments
0 Petitions
Accused Products
Abstract
Method and apparatus are provided in a shared memory, multi-processor computer system for reducing the time required to run an application program on the processors of the system by reducing the overhead associated with the separation of the program and the initiation of the parallel execution of the tasks. The system has a plurality of processors wherein the application program is separated into different tasks and the tasks are executed in parallel on the processors of the system. The system further includes a process enabling the execution of either opportunistic or queued threads. In the preferred embodiment, the method includes the steps of (a) determining if one of the processors is free to execute a first task, and (b) performing the first task if step (a) determines that none of the processors are free. The method also includes the steps of (c) reserving the one processor for the first task if step (a) determines the one processor is free, and (d) constructing and transferring a task data structure for the first task to the reserved processor. Finally, the method includes the steps of (e) creating a future object for the first task, (f) performing the first task on the one processor, and (g) placing the results of step (f) in the future object. An alternative embodiment includes the ability to stack or queue threads onto a Global Queue to await execution by a free processor.
-
Citations
18 Claims
-
1. A method for parallel processing implemented by a computer having a plurality of processors including a main processor for executing a main process and at least one parallel needle processor for executing threads initiated by the main process, the computer also having a memory shared by the plurality of processors, wherein the execution time of the main process is reduced by decreasing the overhead associated with separation from the main process of a plurality of separable threads which are executed in parallel by the plurality of processors when available, the method comprising:
-
(a) determining, by the main processor, if said at least one parallel needle processor is available to execute a first thread; (b) reserving, by the main processor, the exclusive right to use the parallel needle processor responsive to the parallel needle processor being determined to be available, and executing the first thread on the main processor responsive to the parallel needle processor not being available; (c) constructing a packaging data structure including the first thread, by the main processor, and transferring the packaging data structure for execution on the reserved needle processor responsive to the reservation being successful; (d) creating, by the main processor, a future object in the main process while the first thread is being executed on the reserved needle processor so as to allow the main process to continue execution prior to obtaining the result; (e) utilizing, by the main processor, the future object in the main process as if the future object were the result of the execution of the first thread; (f) returning a result of the execution of the first thread to the memory so as to resolve the future object. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for parallel processing implemented by a computer having a plurality of processors including a main processor for executing a main process and at least one parallel needle processor for executing threads initiated by the main process, the computer also having a memory shared by the plurality of processors including a portion designated as a thread queue, wherein the execution time of the main process is reduced by decreasing the overhead associated with separation from the main process of a plurality of separable threads which are executed in parallel by the plurality of processors, the method comprising:
-
(a) constructing a packaging data structure including a first thread, by the main processor, for execution on a parallel needle processor; (b) creating, by the main processor, a future object in the main process for the result while the first thread is being executed so as to allow the main process to continue execution prior to obtaining the result; (c) obtaining, by the main processor, a lock for the thread queue so as to prevent modification of the thread queue by a competing process; (d) placing, by the main processor, the first thread in the thread queue; (e) releasing, by the main processor, the lock for the thread queue so as to allow modification of the thread queue; (f) utilizing, by the main processor, the future object in the main process as if the future object were the result of the thread execution; (g) initializing, by the main processor, a needle process on the parallel needle processor; (h) obtaining, by the main processor, the lock for the thread queue so as to prevent modification of the thread queue by a competing process; (i) removing, by the main processor, the first thread from the thread queue; (j) releasing, by the main processor, the lock for the thread queue so as to allow modification of the thread queue; and (k) executing, by the needle process, the removed first thread so as to determine and return the result for resolving the future object. - View Dependent Claims (11, 12, 13)
-
-
14. Apparatus for parallel processing implemented by a computer having a plurality of processors including a main processor for executing a main process and at least one parallel needle processor for executing threads initiated by the main process, the computer also having a memory shared by the plurality of processors, wherein the execution time of the main process is reduced by decreasing the overhead associated with separation from the main process a plurality of separable threads which are executed in parallel by the plurality of processors when available, the apparatus comprising:
-
means for determining if a parallel needle processor is available to execute a first thread; means for reserving an exclusive right to use the at least one parallel needle processor if the at least one parallel needle processor is determined to be available, and means for executing the first thread on the main processor if the at least one parallel needle processor is determined to be unavailable; means for creating a packaging data structure including the first thread for execution on the reserved parallel needle processor responsive to the reservation being successful; means for transferring the packaging data structure to the reserved parallel needle processor; means for creating a future object in the main process while the first thread is being executed on the reserved parallel needle processor so as to allow the main process to continue execution prior to obtaining a result for the future object; and means for utilizing the future object in the main process as if the future object were the result of the execution of the first thread means for returning a result of the execution of the first thread to the memory so as to resolve the future object; and - View Dependent Claims (15, 16, 17, 18)
-
Specification