Optimized job scheduling and execution in a distributed computing grid
First Claim
1. A system for scheduling a computer-executable job for execution by a network of nodes joined by links, the system comprising:
- a job planner to determine whether there is at least one valid combination of nodes and links with capability and capacity over time to complete the computer-executable job by a deadline, the job planner including;
a validity portion to determine abilities of respective nodes to execute the computer-executable job at a given time; and
a cost portion to determine costs to execute the computer-executable job at the respective nodes at the given time, the cost portion to select a total cost combination of nodes and links from among the at least one valid combination of nodes and links with the capability and capacity over time to complete the computer-executable job by the deadline; and
a job scheduler to cooperate with the job planner to determine, via a processor, at least a first node from the network of nodes that is able to execute the computer-executable job and that has a lowest cost to execute the computer-executable job at the first node at the given time, the job scheduler to base its determination on compiled instructions comprising the computer-executable job.
3 Assignments
0 Petitions
Accused Products
Abstract
An arrangement provides optimal job scheduling in a distributed computing grid having a network of nodes. As jobs enter the system, their requirements are matched against the capabilities at each node to determine (step 202) candidate nodes. From this set of candidate nodes, a subset of valid nodes is selected (step 204) that has sufficient bandwidth for the duration of the job on each link that will need to be used by the job if run at that candidate node. For each valid node, a total cost is computed (step 206) to run the job. The cost may include such factors as bandwidth cost, server cost, storage cost, delay costs, and the like. Finally, a lowest cost node is selected (step 207), and the job is scheduled for execution (step 208) and then run (step 209) on that lowest cost node. An arrangement combining job scheduling with bandwidth on demand (BoD) involves a system for scheduling at least one job for execution on a network of nodes joined by links having respective link capacities, each job associated with a transport capacity requirement. The system has a job scheduler (element 150) configured to schedule the at least one job to be executed on at least one selected node, and a link manager (element 140) configured to reserve at least some of the link capacity of at least one of the links connected to the at least one selected node, to match the job transport capacity requirement.
48 Citations
21 Claims
-
1. A system for scheduling a computer-executable job for execution by a network of nodes joined by links, the system comprising:
-
a job planner to determine whether there is at least one valid combination of nodes and links with capability and capacity over time to complete the computer-executable job by a deadline, the job planner including; a validity portion to determine abilities of respective nodes to execute the computer-executable job at a given time; and a cost portion to determine costs to execute the computer-executable job at the respective nodes at the given time, the cost portion to select a total cost combination of nodes and links from among the at least one valid combination of nodes and links with the capability and capacity over time to complete the computer-executable job by the deadline; and a job scheduler to cooperate with the job planner to determine, via a processor, at least a first node from the network of nodes that is able to execute the computer-executable job and that has a lowest cost to execute the computer-executable job at the first node at the given time, the job scheduler to base its determination on compiled instructions comprising the computer-executable job. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for scheduling a computer-executable job for execution by a network of nodes joined by links, the system comprising:
-
a job planner including; a validity portion to determine abilities of respective nodes to execute the computer-executable job at a given time; and a cost portion to determine costs to execute the computer-executable job at the respective nodes at the given time, the cost portion to select a total cost combination of nodes and links from among at least one valid combination of nodes and links with capability and capacity over time to complete the computer-executable job by a deadline; and a job scheduler to cooperate with the validity portion and the cost portion to determine, via a processor, at least a first node from the network of nodes that is able to execute the computer-executable job and that has a lowest cost to execute the computer-executable job at the first node at the given time, wherein; the computer-executable job is associated with a job transport capacity requirement; all the costs are set to zero so that the job scheduler cooperates with the validity portion to determine at least one node that is able to execute the computer-executable job; and the system further comprises a link manager to reserve at least some of the link capacity of at least one of the links connected to the one or more of the nodes where the computer-executable job is scheduled, such that a resulting reserved link capacity matches or exceeds the job transport capacity requirement.
-
-
9. A system for scheduling a computer-executable job for execution by a network of nodes joined by links, the system comprising:
-
a job planner including; a validity portion to determine abilities of respective nodes to execute the computer-executable job at a given time; and a cost portion to determine costs to execute the computer-executable job at respective nodes at the given time; and a job scheduler to cooperate with the validity portion and the cost portion to determine, via a processor, at least a first node that is able to execute the computer-executable job and that has a lowest cost to execute the computer-executable job at the first node at the given time, wherein; the nodes include components having at least a capability and a capacity; the links include channels having at least a capability and a capacity; and the system further comprises; a job schedule to store a list of computer-executable jobs to be executed, and planned nodes and links to be at least partially utilized by each computer-executable job over time; a job metadata storage including, for each computer-executable job, component capability and capacity requirements, link capability and capacity requirements, and job duration and deadline; a job submission portion to receive the metadata; a node capability database to maintain information about the capability and capacity of each component within the nodes; a node utilization table to maintain information about a planned utilization of capacity of each component of the nodes in accordance with the job schedule; a node cost table to maintain information on a cost of the capacity of each component at the nodes over time; a link capability database to maintain information about a capability and capacity of each channel within the links; a link utilization table to maintain information about a planned utilization of capacity of each channel of the one or more links in accordance with the job schedule; a link cost table to maintain information on a cost of the capacity of each channel of the links over time; a node manager to manage the node capability database, the node utilization table, and the node cost table, and to modify the node utilization table; a link manager to manage the link capability database, the link utilization table, and the link cost table, and to modify the link utilization table; the job planner, when provided a new computer-executable job and its associated metadata, is to communicate with the link manager and the node manager to determine whether there is at least one valid combination of nodes and links with the capability and capacity over time to complete the computer-executable job by the deadline, wherein the cost portion is to select a total cost combination of nodes and links from among the at least one valid combination of nodes and links with the capability and capacity over time to complete the computer-executable job by the deadline; and the job scheduler, if there is at least one valid combination of node and links with the capability and capacity over time to complete the computer-executable job by the deadline, is to schedule the computer-executable job in accordance with the selected total cost combination, and to communicate with the node manager and link manager to update the link utilization table and the node utilization table in accordance with the selected total cost combination, to update the job schedule, and to cause the computer-executable job to execute in accordance with the selected total cost combination of nodes and links.
-
-
10. A system for scheduling a computer-executable job for execution by a network of nodes joined by links having respective link capacities, the computer-executable job associated with a transport capacity requirement, the system comprising:
-
a job planner to determine whether there is at least one valid combination of nodes and links with capability and capacity over time to complete the computer-executable job by a deadline, wherein the job planner is to select a total cost combination of nodes and links from among the at least one valid combination of nodes and links with the capability and capacity over time to complete the computer-executable job by the deadline; a job scheduler to schedule the computer-executable job, via a processor, to be executed on at least one selected node, the job scheduler to base its determination on compiled instructions comprising the computer-executable job; and a link manager to reserve at least some of the link capacity of at least one of the links connected to the at least one selected node from the network of nodes, to match the job transport capacity requirement. - View Dependent Claims (11, 12)
-
-
13. A method for scheduling a computer-executable job for execution on at least one selected node from a network of nodes connected by links, the method comprising:
-
determining combinations of nodes and links that have capability and capacity over time to complete the computer-executable job by a deadline; analyzing measures of total cost associated with execution of the computer-executable job on respective combinations of nodes and links that were determined to have the capability and capacity over time to complete the computer-executable job by the deadline; based on a measure of least total cost associated with execution of the computer-executable job on the respective combinations of nodes and links, selecting the at least one selected node from the network of nodes for executing the computer-executable job, the selection based on compiled instructions comprising the computer-executable job; scheduling the computer-executable job for execution; and causing the computer-executable job to be executed at the at least one selected node. - View Dependent Claims (14, 15, 16)
-
-
17. A method for scheduling a computer-executable job for execution on at least one selected node from a network of nodes connected by links, the method comprising:
-
determining which nodes are capable of executing the at least one computer-executable job; analyzing measures of total cost associated with execution of the computer-executable job on respective nodes that were determined to be capable of executing the computer-executable job; and based on a measure of least total cost associated with execution of the computer-executable job on the respective nodes, selecting the at least one selected node from the network of nodes for executing the computer-executable job, wherein the determining which nodes are capable of executing the computer-executable job includes; determining a planned utilization of the links over time, limited by a capability and a capacity of the respective links; determining a planned utilization of the nodes over time, limited by a capability and a capacity of the respective nodes; determining zero or more candidate nodes having a capability and a capacity that meets or exceeds specified requirements of the computer-executable job; for each candidate node, determining link requirements over time on the links if the computer-executable job were executed at that candidate node; determining whether the link requirements for each candidate node can be met, given the planned utilization of the links over time; and determining zero or more valid nodes to be those candidate nodes that allow link requirements to be met, while executing the computer-executable job. - View Dependent Claims (18, 19)
-
-
20. An apparatus comprising:
-
a memory storing machine readable instructions; and a processor to execute the instructions to perform operations comprising; determining abilities of respective nodes to execute a computer-executable job at a given time; determining whether there is at least one valid combination of nodes and links with capability and capacity over time to complete the computer-executable job by a deadline; determining costs to execute the computer-executable job at the respective nodes at the given time; selecting a total cost combination of nodes and links from among the at least one valid combination of nodes and links with the capability and capacity over time to complete the computer-executable job by the deadline; and scheduling the computer-executable job for execution on at least a first node from the network of nodes that is able to execute the computer-executable job and that has a lowest cost to execute the computer-executable job at the first node at the given time, the scheduling of the computer-executable job for execution on the at least the first node being based on compiled instructions comprising the computer-executable job. - View Dependent Claims (21)
-
Specification