High density compute center resilient booting
First Claim
Patent Images
1. A method, comprising:
- initializing a plurality of processing systems;
communicating status information about the operational health of each of the processing systems capable of operation to a management module responsible for managing the processing systems;
reinitializing one or more of the processing systems, if the management module determines that the one or more of the processing systems is operating in a degraded state based on the status information communicated to the management module from each of the processing systems capable of operation;
reseting one or more of the processing systems, if the management module determines that the one or more of the processing systems is incapable of operation;
maintaining a failure count of a number of the resets for each of the processing systems incapable of operation; and
alerting an administrator system about any of the processing systems having a failure count greater than a predefined number.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method to implement a resilient compute center. A plurality of processing systems is initialized. Each of the processing systems capable of operation communicates status information about its operational health to a management module responsible for managing the processing systems. The management module reinitializing any of the processing systems, if the management module determines that any of the processing systems is operating in a degraded state based on the status information communicated to the management module.
15 Citations
20 Claims
-
1. A method, comprising:
-
initializing a plurality of processing systems; communicating status information about the operational health of each of the processing systems capable of operation to a management module responsible for managing the processing systems; reinitializing one or more of the processing systems, if the management module determines that the one or more of the processing systems is operating in a degraded state based on the status information communicated to the management module from each of the processing systems capable of operation; reseting one or more of the processing systems, if the management module determines that the one or more of the processing systems is incapable of operation; maintaining a failure count of a number of the resets for each of the processing systems incapable of operation; and alerting an administrator system about any of the processing systems having a failure count greater than a predefined number. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A processing blade, comprising
at least one processor to execute instructions; -
system memory coupled to the at least one processor; a communication link to communicatively couple to a management module for managing a rack of processing blades including the processing blade; an error module configured to generate status information about the operational health of the processing blade and to communicate the status information to the management module via the communication link; and a surrogate management module, the surrogate management module configured to query the management module for managing the rack of processing blades and to assume tasks for managing the rack of processing blades from the management module if the management module is disabled. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A system, comprising:
-
a chassis; a management module supported by the chassis; a communication plane coupled to the management module; and a plurality of processing blades supported by the chassis and coupled to the communication plane, one of the processing blades including; at least one processor to execute instructions; system memory coupled to the at least one processor; and a surrogate management module, the surrogate management module coupled to query the management module to determine an operational health of the management module and to assume management duties of the management module if the management module is disabled, the management module coupled to the communication plane to manage the plurality of processing blades. - View Dependent Claims (17, 18, 19, 20)
-
Specification