Managing nodes in a high-performance computing system using a node registrar
First Claim
Patent Images
1. A high-performance computing (HPC) system, comprising:
- at least one processor;
a memory communicatively coupled to the at least one processor;
a node registrar subsystem, implemented on the at least one processor, that performs logical node management functions and data sharing between HPC subsystems to provide interaction between heterogeneous subsystems, the logical node management functions including at least one of handling state transitions of nodes, adding the nodes, removing the nodes, and updating node properties, wherein the node registrar subsystem comprises a stateless node registrar service and a database for storing node information for nodes in the HPC system, the stateless node registrar service providing high-availability through scale-out of multiple services rather than relying on failover;
a management subsystem, implemented on the at least one processor, that performs other management functions other than the logical node management functions, wherein the management subsystem communicates with the node registrar subsystem in performing the other management functions; and
a job scheduler subsystem, implemented on the at least one processor, that performs job scheduling functions, wherein the job scheduler subsystem communicates with the node registrar subsystem in performing the job scheduling functions, wherein the management subsystem, the job scheduler subsystem, and the node registrar subsystem are separate, heterogeneous subsystems, and wherein the node registrar subsystem provides communication connections that allow direct communication interaction between the management subsystem, the job scheduler subsystem, and the node registrar subsystem.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of managing nodes in a high-performance computing (HPC) system, which includes a management subsystem and a job scheduler subsystem, includes providing a node registrar subsystem. Logical node management functions are performed with the node registrar subsystem. Other management functions are performed with the management subsystem using the node registrar subsystem. Job scheduling functions are performed with the job scheduler subsystem using the node registrar subsystem.
-
Citations
20 Claims
-
1. A high-performance computing (HPC) system, comprising:
-
at least one processor; a memory communicatively coupled to the at least one processor; a node registrar subsystem, implemented on the at least one processor, that performs logical node management functions and data sharing between HPC subsystems to provide interaction between heterogeneous subsystems, the logical node management functions including at least one of handling state transitions of nodes, adding the nodes, removing the nodes, and updating node properties, wherein the node registrar subsystem comprises a stateless node registrar service and a database for storing node information for nodes in the HPC system, the stateless node registrar service providing high-availability through scale-out of multiple services rather than relying on failover; a management subsystem, implemented on the at least one processor, that performs other management functions other than the logical node management functions, wherein the management subsystem communicates with the node registrar subsystem in performing the other management functions; and a job scheduler subsystem, implemented on the at least one processor, that performs job scheduling functions, wherein the job scheduler subsystem communicates with the node registrar subsystem in performing the job scheduling functions, wherein the management subsystem, the job scheduler subsystem, and the node registrar subsystem are separate, heterogeneous subsystems, and wherein the node registrar subsystem provides communication connections that allow direct communication interaction between the management subsystem, the job scheduler subsystem, and the node registrar subsystem. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A node registrar subsystem for a high-performance computing (HPC) system, the node registrar subsystem comprising:
-
at least one processor; a memory communicatively coupled to the at least on processor; a database, implemented on the memory, that stores node information for nodes in the HPC system; and a stateless node registrar service, implemented on the at least one processor, that performs logical node management functions, communicates with a management subsystem to facilitate access to the database by the management subsystem in response to the management subsystem performing other management functions other than the logical node management functions, and communicates with a job scheduler subsystem to facilitate access to the database by the job scheduler subsystem in response to the job scheduler subsystem performing job scheduling functions, the stateless node registrar service providing high-availability through scale-out of multiple services rather than relying on failover, the logical node management functions including at least one of handling state transitions of the nodes, adding the nodes, removing the nodes, and updating node properties, wherein the management subsystem and the job scheduler subsystem are separate heterogeneous subsystems from the node registrar subsystem, and wherein the node registrar subsystem provides communication connections between the management subsystem, the job scheduler subsystem, and the node registrar subsystem. - View Dependent Claims (15, 16, 17)
-
-
18. A method of managing nodes in a high-performance computing (HPC) system, the method comprising:
-
performing, by a node registrar subsystem implemented on at least one processor, logical node management functions in the HPC system, the logical node management functions including at least one of handling state transitions of nodes, adding the nodes, removing the nodes, and updating node properties, wherein the node registrar subsystem comprises a stateless node registrar service and a database for storing node information for the nodes in the HPC system, the stateless node registrar service providing high-availability through scale-out of multiple services rather than relying on failover; providing, by the node registrar subsystem, communication connections between a management subsystem, a job scheduler subsystem, and the node registrar subsystem, the management subsystem and the job scheduler subsystem being separate heterogeneous subsystems from the node registrar subsystem; performing other management functions other than the logical node management functions to be performed within the HPC system with the management subsystem, the management subsystem directly communicating with the node registrar subsystem via the communication connections provided by the node registrar subsystem to perform the other management functions; and performing job scheduling functions within the HPC system with the job scheduler subsystem, the job scheduler subsystem directly communicating with the node registrar subsystem via the communication connections to perform the job scheduling functions. - View Dependent Claims (19, 20)
-
Specification