Health monitoring and recovery for infrastructure devices
First Claim
1. One or more computer storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform operations comprising:
- triggering a health monitoring event for an infrastructure device supporting one or more server devices in a data center;
identifying device information for the infrastructure device;
determining an operational context of the infrastructure device in supporting the one or more server devices in the data center;
determining a health monitoring process for the infrastructure device based on the device information for the infrastructure device and the operational context of the infrastructure device in supporting the one or more server devices in the data center;
performing the determined health monitoring process for the infrastructure device to assess health of the infrastructure device;
determining to perform an automated recovery operation for the infrastructure device based on the health of the infrastructure device;
in response to determining to perform the automated recovery operation for the infrastructure device, determining one or more recovery actions for the automated recovery operation based on the device information for the infrastructure device, the operational context of the infrastructure device in supporting the one or more server devices in the data center, and a failure context of the infrastructure device determined from the health monitoring process for the infrastructure device; and
performing at least a portion of the one or more recovery actions for the infrastructure device.
2 Assignments
0 Petitions
Accused Products
Abstract
Automated health monitoring and recovery is provided for infrastructure devices supporting server devices in a data center. Health analysis operations may be selected to be performed on an infrastructure device based on the capabilities of the infrastructure device and/or how the infrastructure device is being used to support server devices in the data center. If the infrastructure device is unhealthy, an automated recovery operation may be performed. The automated recovery operation may include recovery actions selected based on the capabilities of the infrastructure device, the failure mode of the infrastructure device, and/or how the infrastructure device is being used to support server devices in the data center.
-
Citations
20 Claims
-
1. One or more computer storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform operations comprising:
-
triggering a health monitoring event for an infrastructure device supporting one or more server devices in a data center; identifying device information for the infrastructure device; determining an operational context of the infrastructure device in supporting the one or more server devices in the data center; determining a health monitoring process for the infrastructure device based on the device information for the infrastructure device and the operational context of the infrastructure device in supporting the one or more server devices in the data center; performing the determined health monitoring process for the infrastructure device to assess health of the infrastructure device; determining to perform an automated recovery operation for the infrastructure device based on the health of the infrastructure device; in response to determining to perform the automated recovery operation for the infrastructure device, determining one or more recovery actions for the automated recovery operation based on the device information for the infrastructure device, the operational context of the infrastructure device in supporting the one or more server devices in the data center, and a failure context of the infrastructure device determined from the health monitoring process for the infrastructure device; and performing at least a portion of the one or more recovery actions for the infrastructure device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising:
-
one or more processors; and one or more computer storage media storing computer-useable instructions that, when used by the one or more processors, cause the one or more processors to; trigger performance of an automated recovery operation for an infrastructure device supporting one or more server devices in a data center; determine one or more recovery actions for the automated recovery operation based on at least one selected from the following;
device information for the infrastructure device, an operational context of the infrastructure device in supporting the one or more server devices in the data center, and a failure context of the infrastructure device, the one or more recovery actions comprising one or more actions attempting to recover the infrastructure device;perform at least a portion of the one or more recovery actions for the infrastructure device; monitor operation of at least one of the one or more server devices when the one or more recovery actions are performed; detect an adverse change in the operation of at least one of the one or more server devices; and in response to detecting the adverse change, modify the one or more recovery actions. - View Dependent Claims (12, 13, 14)
-
-
15. A computer-implemented method comprising:
-
triggering a health monitoring event for an infrastructure device supporting one or more server devices in a data center; identifying device information for the infrastructure device; determining an operational context of the infrastructure device in supporting the one or more server devices in the data center; determining a health monitoring process for the infrastructure device based on the device information for the infrastructure device and the operational context of the infrastructure device in supporting the one or more server devices in the data center; performing the determined health monitoring process for the infrastructure device to assess health of the infrastructure device; determining to perform an automated recovery operation for the infrastructure device based on the health of the infrastructure device; in response to determining to perform the automated recovery operation for the infrastructure device, determining one or more recovery actions for the automated recovery operation based on the device information for the infrastructure device, the operational context of the infrastructure device in supporting the one or more server devices in the data center, and a failure context of the infrastructure device determined from the health monitoring process for the infrastructure device; and performing at least a portion of the one or more recovery actions for the infrastructure device. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification