System and method for detecting and managing HPC node failure
First Claim
Patent Images
1. A method for managing HPC node failure comprising:
- determining that one of a plurality of HPC nodes has failed, each HPC node comprising an integrated fabric; and
removing the failed node from a virtual list of HPC nodes, the virtual list comprising one logical entry for each of the plurality of HPC nodes.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for managing HPC node failure includes determining that one of a plurality of HPC nodes has failed, with each HPC node comprising an integrated fabric. The failed node is then removed from a virtual list of HPC nodes, with the virtual list comprising one logical entry for each of the plurality of HPC nodes.
-
Citations
30 Claims
-
1. A method for managing HPC node failure comprising:
-
determining that one of a plurality of HPC nodes has failed, each HPC node comprising an integrated fabric; and
removing the failed node from a virtual list of HPC nodes, the virtual list comprising one logical entry for each of the plurality of HPC nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. Software for managing HPC node failure operable to:
-
determine that one of a plurality of HPC nodes has failed, each node comprising an integrated fabric; and
remove the failed node from a virtual list of HPC nodes, the virtual list comprising one logical entry for each of the plurality of HPC nodes. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system for managing HPC node failure comprising:
-
a plurality of HPC nodes, each node including an integrated fabric; and
a management node operable to;
determine that one of the plurality of HPC nodes has failed, each node comprising an integrated fabric; and
remove the failed node from a virtual list of HPC nodes, the virtual list comprising one logical entry for each of the plurality of HPC nodes. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification