Automatic repair of computing devices in a data center
First Claim
1. A management device for managing a plurality of computing devices in a data center, wherein the management device comprises:
- a network interface for communicating with the plurality of computing devices,a first module that sends a first health status query for a selected one of the computing devices,a second module configured to receive and process any responses to the first health status query, anda third module configured to create support tickets,wherein the second module is configured to, in response to not receiving an acceptable response to the first health status query within a first predetermined time;
(a) send a first repair instruction to the selected computing device,(b) wait at least enough time for the first repair instruction to complete,(c) cause the first module to send a second health status query to the selected computing device; and
(d) in response to not receiving an acceptable response to the second health status query within a second predetermined time;
(i) cause the first module to send a second repair instruction to the selected computing device,(ii) wait at least enough time for the second repair instruction to complete,(iii) send a third health status query to the selected computing device; and
(iv) in response to not receiving an acceptable response to the third health status query, cause the third module to create a support ticket identifying the second computing device and the second computing device'"'"'s health status.
12 Assignments
0 Petitions
Accused Products
Abstract
A system and method for automating management and repair of a plurality of computing devices located in a data center is disclosed. Health status queries are issued for one or more of the computing devices. If responses not indicative of good device health are received, one or more repair instructions are automatically sent to the unhealthy computing device to repair the computing device by moving it to an acceptable state. If the repair instructions are not successful, a support ticket is automatically generated for the corresponding computing device or devices. Problematic statuses across areas of the data center may be detected and ticketed in addition to individual problematic devices. So-called repeat offender devices may be detected and ticketed even if the repair instructions are successful.
29 Citations
20 Claims
-
1. A management device for managing a plurality of computing devices in a data center, wherein the management device comprises:
-
a network interface for communicating with the plurality of computing devices, a first module that sends a first health status query for a selected one of the computing devices, a second module configured to receive and process any responses to the first health status query, and a third module configured to create support tickets, wherein the second module is configured to, in response to not receiving an acceptable response to the first health status query within a first predetermined time; (a) send a first repair instruction to the selected computing device, (b) wait at least enough time for the first repair instruction to complete, (c) cause the first module to send a second health status query to the selected computing device; and (d) in response to not receiving an acceptable response to the second health status query within a second predetermined time; (i) cause the first module to send a second repair instruction to the selected computing device, (ii) wait at least enough time for the second repair instruction to complete, (iii) send a third health status query to the selected computing device; and (iv) in response to not receiving an acceptable response to the third health status query, cause the third module to create a support ticket identifying the second computing device and the second computing device'"'"'s health status. - View Dependent Claims (2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15)
-
-
8. A method for managing a plurality of computing devices in a data center, the method comprising:
-
issuing from a first computing device a first health status query for a second computing device of said plurality of computing devices; in response to not receiving an acceptable response to the first health status query within a first predetermined time; (i) issuing from the first computing device to the second computing device, a first repair instruction, (ii) waiting at least enough time for the first repair instruction to complete, (iii) issuing from the first computing device a second health status query for the second computing device, and (iv) in response to not receiving an acceptable response to the second health status query within a second predetermined time; (i) issuing from the first computing device to the second computing device, a second repair instruction, (ii) waiting at least enough time for the second repair instruction to complete, (iii) issuing from the first computing device a third health status query for the second computing device, and (iv) in response to not receiving an acceptable response to the third health status query within a second predetermined time, issuing from the first computing device a repair ticket.
-
-
16. A non-transitory, computer-readable storage medium storing instructions executable by a processor of a computational device, which when executed cause the computational device to:
send a first health status query from a first computing device for a second computing device, and in response to not receiving an acceptable response to the first health status query within a first predetermined time; (v) send from the first computing device to the second computing device, a first repair instruction, (vi) wait at least enough time for the first repair instruction to complete, (vii) send a second health status query from the first computing device to the second computing device; and (viii) in response to not receiving an acceptable response to the second health status query within a second predetermined time; (v) send from the first computing device to the second computing device, a second repair instruction, (vi) wait at least enough time for the second repair instruction to complete, (vii) send a third health status query from the first computing device to the second computing device, (viii) in response to not receiving an acceptable response to the third health status query from the second computing device within a second predetermined time, send from the first computing device a repair ticket. - View Dependent Claims (17, 18, 19, 20)
Specification