Automatically building a locally managed virtual node grouping to handle a grid job requiring a degree of resource parallelism within a grid environment
First Claim
1. A computer-implemented method for building virtual node groupings within a grid environment, comprising:
- detecting a grid job at a particular grid manager from among a plurality of grid managers within a grid environment, wherein said grid job requires a particular degree of parallelism for execution, wherein a plurality of resource nodes within said grid environment are identified in physically disparate groups each managed by one from among said plurality of grid managers through a plurality of web services implemented within a web services layer extended by an open grid services infrastructure atop a grid service layer comprising at least one grid service implemented within an open grid services architecture, wherein each of said plurality of grid managers comprises a grid manager communication subsystem for communicating between said plurality of grid managers, wherein said particular grid manager locally manages a first selection of resource nodes from among said plurality of resource nodes within said grid environment within a particular physical location, wherein at least one additional local grid manager manages a second selection of resource nodes from among said plurality of resource nodes within said particular physical location, wherein at least one remote grid manager manages a third selection of resource nodes from among said plurality of resource nodes within a remote physical location;
responsive to said particular grid manager detecting that insufficient resources are available for a required execution environment for said grid job from said first selection of resource nodes, accessing, from said plurality of grid managers through said grid manager communication subsystem, a current availability, a current wait time, a current run time, and a current cost for each of said plurality of resource nodes within said grid environment;
responsive to detecting said second selection of resource nodes are available to build said required execution environment for said grid job from said current availability returned from said at least one additional local grid manager, calculating a total local run time from said current wait time and current run time for said second selection of resources and calculating a total local cost from said current cost for said second selection of resources for building said required execution environment with said second selection of resource nodes;
comparing said total local run time and said total local cost with a remote time calculated from said current wait time and said current run time for said third selection of resources and a remote cost calculated from said current cost for said third selection of resources;
responsive to determining at least one of said total local run time less than said remote time and said total local cost less than said remote cost, selecting said second selection of resource nodes from among said plurality of resource nodes to build into a virtual node grouping for said required execution environment for executing said grid job;
building said virtual node grouping by said particular grid manager through said grid manager communication subsystem by adding an Internet Protocol address alias for said virtual node grouping to a separate network card of each of said second selection of resource nodes to acquire temporary management control over said second selection of resource nodes from said at least one additional local grid manager for a duration of execution of said grid job within said virtual node grouping;
responsive to determining said total run time slower than said remote time and said total local cost greater than said remote cost, selecting said third selection of resource nodes to build into said virtual node grouping for said required execution environment;
building said virtual node grouping by said particular grid manager through said grid manager communication subsystem by adding said Internet Protocol address alias for said virtual node grouping to each separate network card of each of said third selection of resource nodes to acquire temporary management control over said third selection of resource nodes from said at least one remote grid manager; and
responsive to the grid job execution completed, deconstructing said virtual node grouping.
5 Assignments
0 Petitions
Accused Products
Abstract
A method, system, and program for automatically building a locally managed virtual node grouping to handle a grid job requiring a degree of resource parallelism for execution within a grid environment are provided. The grid environment includes multiple resource nodes which are identified by physical location as physically disparate groups each managed by a grid manager. The grid managers include a grid virtual node grouping subsystem that enables a particular grid manager receiving a grid job that requires a particular degree of resource parallelism for execution to build a virtual node grouping of resources from across the grid environment and locally manage the resources included in the virtual node grouping. In particular, the particular grid manager accesses, from the other grid managers, a current availability and workload of each of the physically disparate resource nodes. The particular grid manager selects a selection of resource nodes to build into a virtual node grouping for executing the grid job. The virtual node grouping is built by the other grid managers enabling the particular grid manager to acquire temporary management control over the selection of resource nodes for a duration of the execution of the grid job within the virtual node grouping.
-
Citations
9 Claims
-
1. A computer-implemented method for building virtual node groupings within a grid environment, comprising:
-
detecting a grid job at a particular grid manager from among a plurality of grid managers within a grid environment, wherein said grid job requires a particular degree of parallelism for execution, wherein a plurality of resource nodes within said grid environment are identified in physically disparate groups each managed by one from among said plurality of grid managers through a plurality of web services implemented within a web services layer extended by an open grid services infrastructure atop a grid service layer comprising at least one grid service implemented within an open grid services architecture, wherein each of said plurality of grid managers comprises a grid manager communication subsystem for communicating between said plurality of grid managers, wherein said particular grid manager locally manages a first selection of resource nodes from among said plurality of resource nodes within said grid environment within a particular physical location, wherein at least one additional local grid manager manages a second selection of resource nodes from among said plurality of resource nodes within said particular physical location, wherein at least one remote grid manager manages a third selection of resource nodes from among said plurality of resource nodes within a remote physical location; responsive to said particular grid manager detecting that insufficient resources are available for a required execution environment for said grid job from said first selection of resource nodes, accessing, from said plurality of grid managers through said grid manager communication subsystem, a current availability, a current wait time, a current run time, and a current cost for each of said plurality of resource nodes within said grid environment; responsive to detecting said second selection of resource nodes are available to build said required execution environment for said grid job from said current availability returned from said at least one additional local grid manager, calculating a total local run time from said current wait time and current run time for said second selection of resources and calculating a total local cost from said current cost for said second selection of resources for building said required execution environment with said second selection of resource nodes; comparing said total local run time and said total local cost with a remote time calculated from said current wait time and said current run time for said third selection of resources and a remote cost calculated from said current cost for said third selection of resources; responsive to determining at least one of said total local run time less than said remote time and said total local cost less than said remote cost, selecting said second selection of resource nodes from among said plurality of resource nodes to build into a virtual node grouping for said required execution environment for executing said grid job; building said virtual node grouping by said particular grid manager through said grid manager communication subsystem by adding an Internet Protocol address alias for said virtual node grouping to a separate network card of each of said second selection of resource nodes to acquire temporary management control over said second selection of resource nodes from said at least one additional local grid manager for a duration of execution of said grid job within said virtual node grouping; responsive to determining said total run time slower than said remote time and said total local cost greater than said remote cost, selecting said third selection of resource nodes to build into said virtual node grouping for said required execution environment; building said virtual node grouping by said particular grid manager through said grid manager communication subsystem by adding said Internet Protocol address alias for said virtual node grouping to each separate network card of each of said third selection of resource nodes to acquire temporary management control over said third selection of resource nodes from said at least one remote grid manager; and responsive to the grid job execution completed, deconstructing said virtual node grouping. - View Dependent Claims (2, 3, 4)
-
-
5. A system for building virtual node groupings within a grid environment, comprising:
-
a grid environment comprising a plurality of resource nodes identified in physically disparate groups each managed by one from among said plurality of grid managers through a plurality of web services implemented within a web services layer extended by an open grid services infrastructure atop a grid service layer comprising at least one grid service implemented within an open grid services architecture, wherein each of said plurality of grid managers comprises a grid manager communication subsystem for communicating between said plurality of grid managers; a particular grid manager from among said plurality of grid managers that locally manages a first selection of resource nodes from among said plurality of resource nodes within a particular physical location of said grid environment, wherein at least one additional local grid manager from among said plurality of grid managers manages a second selection of resource nodes from among said plurality of resource nodes within said particular physical location, wherein at least one remote grid manager from among said plurality of grid managers manages a third selection of resource nodes from among said plurality of resource nodes within a remote physical location; said particular grid manager further comprising; means for detecting a grid job that requires a particular degree of parallelism for execution within said grid environment; means, responsive to said particular grid manager detecting that insufficient resources are available for a required execution environment for said grid job from said first selection of resource nodes, for accessing, from said plurality of grid managers through said grid manager communication subsystem, a current availability, a current wait time, a current run time, and a current cost for each of said plurality of resource nodes within said grid environment; means, responsive to detecting said second selection of resource nodes are available to build said required execution environment for said grid job from said current availability returned from said at least one additional local grid manager, for calculating a total local run time from said current wait time and current run time for said second selection of resources and calculating a total local cost from said current cost for said second selection of resources for building said required execution environment with said second selection of resource nodes; means for comparing said total local run time and said total local cost with a remote time calculated from said current wait time and said current run time for said third selection of resources and a remote cost calculated from said current cost for said third selection of resources; means, responsive to determining at least one of said total local run time less than said remote time and said total local cost less than said remote cost, for selecting said second selection of resource nodes from among said plurality of resource nodes to build into a virtual node grouping for said required execution environment for executing said grid job; means for building said virtual node grouping by said particular grid manager through said grid manager communication subsystem by adding an Internet Protocol address alias for said virtual node grouping to a separate network card of each of said second selection of resource nodes to acquire temporary management control over said second selection of resource nodes from said at least one additional local grid manager for a duration of execution of said grid job within said virtual node grouping; means, responsive to determining said total run time slower than said remote time and said total local cost greater than said remote cost, for selecting said third selection of resource nodes to build into said virtual node grouping for said required execution environment; means for building said virtual node grouping by said particular grid manager through said grid manager communication subsystem by adding said Internet Protocol address alias for said virtual node grouping to each separate network card of each of said third selection of resource nodes to acquire temporary management control over said third selection of resource nodes from said at least one remote grid manager; and means, responsive to the grid job execution completed, for deconstructing said virtual node grouping. - View Dependent Claims (6, 7, 8)
-
-
9. A computer executable program product comprising computer executable instructions tangibly embodied on a non-transitory volatile or non-volatile computer readable medium that when executed by said computer perform the method steps for building virtual node groupings within a grid environment, comprising:
-
enabling detection of a grid job at a particular grid manager from among a plurality of grid managers within a grid environment, wherein said grid job requires a particular degree of parallelism for execution, wherein a plurality of resource nodes within said grid environment are identified in physically disparate groups each managed by one from among said plurality of grid managers through a plurality of web services implemented within a web services layer extended by an open grid services infrastructure atop a grid service layer comprising at least one grid service implemented within an open grid services architecture, wherein each of said plurality of grid managers comprises a grid manager communication subsystem for communicating between said plurality of grid managers, wherein said particular grid manager locally manages a first selection of resource nodes from among said plurality of resource nodes within said grid environment within a particular physical location, wherein at least one additional local grid manager manages a second selection of resource nodes from among said plurality of resource nodes within said particular physical location, wherein at least one remote grid manager manages a third selection of resource nodes from among said plurality of resource nodes within a remote physical location; responsive to said particular grid manager detecting that insufficient resources are available for a required execution environment for said grid job from said first selection of resource nodes, controlling access, from said plurality of grid managers through said grid manager communication subsystem, a current availability, a current wait time, a current run time, and a current cost for each of said plurality of resource nodes within said grid environment; responsive to detecting said second selection of resource nodes are available to build said required execution environment for said grid job from said current availability returned from said at least one additional local grid manager, calculating a total local run time from said current wait time and current run time for said second selection of resources and calculating a total local cost from said current cost for said second selection of resources for building said required execution environment with said second selection of resource nodes; comparing said total local run time and said total local cost with a remote time calculated from said current wait time and said current run time for said third selection of resources and a remote cost calculated from said current cost for said third selection of resources; responsive to determining at least one of said total local run time less than said remote time and said total local cost less than said remote cost, controlling selection of said second selection of resource nodes from among said plurality of resource nodes to build into a virtual node grouping for said required execution environment for executing said grid job; controlling the building of said virtual node grouping by enabling said particular grid manager through said grid manager communication subsystem by adding an Internet Protocol address alias for said virtual node grouping to a separate network card of each of said second selection of resource nodes to acquire temporary management control over said second selection of resource nodes from said at least one additional local grid manager for a duration of execution of said grid job within said virtual node grouping; responsive to determining said total run time slower than said remote time and said total local cost greater than said remote cost, selecting said third selection of resource nodes to build into said virtual node grouping for said required execution environment; building said virtual node grouping by said particular grid manager through said grid manager communication subsystem by adding said Internet Protocol address alias for said virtual node grouping to each separate network card of each of said third selection of resource nodes to acquire temporary management control over said third selection of resource nodes from said at least one remote grid manager; and responsive to the grid job execution completed, deconstructing said virtual node grouping.
-
Specification