System and method for distributed information handling system cluster active-active master node
First Claim
1. An information handling system comprising:
- plural computing nodes having computing resources operable to perform jobs assigned by a master node;
plural master nodes interfaced with each other and with the computing nodes, each master node having a resource manager operable to accept job requests, the resource managers further operable to simultaneously assign the job requests to the computing resources and manage performance of the job requests by the computing resources; and
a job scheduler associated with each of the plural master nodes and operable to prevent simultaneous assignment of job requests by different of the resource managers to the same computing resources.
1 Assignment
0 Petitions
Accused Products
Abstract
Computing nodes, such as plural information handling systems configured as a High Performance Computing Cluster (HPCC), are managed with plural master nodes configured to have active-active interaction. A resource manager of each of the plural master nodes is operable to simultaneously assign computing node resources to job requests. Reservations are made by a job scheduler in a table of a storage common to the active-active master nodes to avoid conflicts between master nodes and then reserved computing resources are assigned for management by the reserving master node resource manager. A failure manager monitors the master nodes to detect a failure, such as by a lack of communication from a master node for a predetermined time, and recovers a failed master node by assigning the jobs associated with the failed master node to an operating master node.
38 Citations
20 Claims
-
1. An information handling system comprising:
-
plural computing nodes having computing resources operable to perform jobs assigned by a master node;
plural master nodes interfaced with each other and with the computing nodes, each master node having a resource manager operable to accept job requests, the resource managers further operable to simultaneously assign the job requests to the computing resources and manage performance of the job requests by the computing resources; and
a job scheduler associated with each of the plural master nodes and operable to prevent simultaneous assignment of job requests by different of the resource managers to the same computing resources. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for managing plural computing nodes of a High Performance Computing Cluster with plural master nodes, the method comprising:
-
receiving plural job requests at each of the plural master nodes;
reserving computing node resources for each job request with the master node that received the job request;
confirming that the reserved computing node resources do not conflict with each other;
assigning the computing node resources to the job requests as reserved. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. An information handling system comprising:
-
a resource manager operable to assign computing jobs to computing resources of plural computing nodes and to manage the performance of the computing jobs by the computing resources; and
a job scheduler interfaced with the resource manager and operable to coordinate allocation of computing resources between the resource manager and one or more associated information handling systems that are also operable to assign computing jobs to the computing resources. - View Dependent Claims (19, 20)
-
Specification