Identifying troubleshooting options for resolving network failures
First Claim
Patent Images
1. A method comprising:
- receiving an alarm generated by a device in a data center, the alarm comprising a failure condition that is indicative of a network failure in the data center;
responsive to receiving the alarm and based upon the alarm, identifying a failing device that causes the network failure;
responsive to identifying the failing device, and based upon historical data retained in a data repository, identifying a failure symptom of the network failure based upon the failure condition and the failing device, wherein the historical data comprises;
at least one failure symptom;
troubleshooting options previously undertaken with respect to the failing device in the data center to mitigate the network failure; and
a failure history table that comprises historical failure frequency of the failing device;
outputting data that is indicative of the historical failure frequency of the failing device responsive to the failing device being identified;
responsive to identifying the failing device, identifying the troubleshooting options;
assigning labels to the troubleshooting options, the labels indicative of probabilities that the troubleshooting options, when undertaken with respect to the failing device, will mitigate the failure symptom;
outputting the plurality of troubleshooting options and their labels;
receiving feedback from an operator that the failure symptom has been mitigated and that a troubleshooting option in the troubleshooting options was employed to mitigate the failure symptom; and
updating the failure history table based upon the feedback.
2 Assignments
0 Petitions
Accused Products
Abstract
Described herein are various technologies pertaining to providing assistance to an operator in a data center with respect to failures in the data center. An alarm is received, and a failing device is identified based upon content of the alarm. Failure conditions of the alarm are mapped to a failure symptom that may be exhibited by the failing device, and troubleshooting options previously employed to mitigate the failure symptom are retrieved from historical data. Labels are respectively assigned to the troubleshooting options, where a label is indicative of a probability that a troubleshooting option to which the label has been assigned will mitigate the failure symptom.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving an alarm generated by a device in a data center, the alarm comprising a failure condition that is indicative of a network failure in the data center; responsive to receiving the alarm and based upon the alarm, identifying a failing device that causes the network failure; responsive to identifying the failing device, and based upon historical data retained in a data repository, identifying a failure symptom of the network failure based upon the failure condition and the failing device, wherein the historical data comprises; at least one failure symptom; troubleshooting options previously undertaken with respect to the failing device in the data center to mitigate the network failure; and a failure history table that comprises historical failure frequency of the failing device; outputting data that is indicative of the historical failure frequency of the failing device responsive to the failing device being identified; responsive to identifying the failing device, identifying the troubleshooting options; assigning labels to the troubleshooting options, the labels indicative of probabilities that the troubleshooting options, when undertaken with respect to the failing device, will mitigate the failure symptom; outputting the plurality of troubleshooting options and their labels; receiving feedback from an operator that the failure symptom has been mitigated and that a troubleshooting option in the troubleshooting options was employed to mitigate the failure symptom; and updating the failure history table based upon the feedback. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 18)
-
-
10. A resolution system that facilitates resolving network failures in a data center, the resolution system comprising:
-
at least one processor; and memory that stores instructions that, when executed by the at least one processor, cause the at least one processor to perform acts comprising; receiving an alarm generated by a device in the data center, wherein the alarm is indicative of a network failure in the data center; responsive to receiving the alarm, identifying a failure symptom of the network failure based upon a failure condition in the received alarm; identifying a failing device that causes the network failure, wherein the failing device is identified based upon the alarm; outputting data that is indicative of historical failure frequency of the failing device relative to historic failure frequencies of other devices in the data center responsive to the failing device being identified; and outputting troubleshooting options for resolving the network failure based upon the failure symptom, wherein the troubleshooting options have labels assigned thereto that are indicative of confidences that the troubleshooting options, when performed by an operator of the data center, will resolve the network failure. - View Dependent Claims (11, 12, 13, 14, 15, 19, 20)
-
-
16. A computer-readable storage medium comprising instructions that, when executed by a processor, cause the processor to perform acts comprising:
-
receiving an alarm, the alarm comprises a failure condition that is indicative of a network failure in a data center; responsive to receiving the alarm, identifying a failing device that causes the network failure, the failing device identified based upon the failure condition; outputting data that is indicative of historical failure frequency of the failing device responsive to the failing device being identified; responsive to identifying the failing device, identifying a failure symptom in a failure history table based upon the failure condition, wherein the failure history table indicates that the failure symptom was previously exhibited by the failing device, and further wherein the failure history table comprises troubleshooting options previously employed to mitigate the failure symptom; retrieving the troubleshooting options responsive to identifying the failure symptom; and outputting the troubleshooting options and labels for the troubleshooting options, the labels indicative of confidences that the troubleshooting options, when employed by an operator, will mitigate the failure symptom, wherein the labels are determined based upon operator feedback pertaining to the troubleshooting options, and further wherein the operator feedback is indicative as to whether or not the troubleshooting options previously resolved the network failure. - View Dependent Claims (17)
-
Specification