Automated datacenter network failure mitigation
First Claim
Patent Images
1. A method performed at least in part by at least one processor, comprising:
- monitoring a network;
determining a network component set corresponding to abnormal behavior, in which the network component set comprises a plurality of links; and
taking automated action on the network component set to mitigate the abnormal behavior by;
identifying a plurality of proposed actions;
estimating an impact on the network for executing each of the proposed actions;
estimating a success rate for each of the proposed actions selecting one of more of the proposed actions based at least on the estimated impact and the estimated success rate; and
applying the one or more of the proposed actions based at least on the estimated impact and the estimated success rate.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject disclosure is directed towards a technology that automatically mitigates datacenter failures, instead of relying on human intervention to diagnose and repair the network. Via a mitigation pipeline, when a network failure is detected, a candidate set of components that are likely to be the cause of the failure is identified, with mitigation actions iteratively targeting each component to attempt to alleviate the problem. The impact to the network is estimated to ensure that the redundancy present in the network will be able to handle the mitigation action without adverse disruption to the network.
37 Citations
20 Claims
-
1. A method performed at least in part by at least one processor, comprising:
-
monitoring a network; determining a network component set corresponding to abnormal behavior, in which the network component set comprises a plurality of links; and taking automated action on the network component set to mitigate the abnormal behavior by; identifying a plurality of proposed actions; estimating an impact on the network for executing each of the proposed actions; estimating a success rate for each of the proposed actions selecting one of more of the proposed actions based at least on the estimated impact and the estimated success rate; and applying the one or more of the proposed actions based at least on the estimated impact and the estimated success rate. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising:
-
a detector configured to process network state data for a network to determine a state indicative of abnormal behavior in a network component set comprising a plurality of links; a planner configured to determine a plan for mitigating the abnormal behavior by identifying a plurality of proposed actions, the planner coupled to an impact estimator configured to; estimate an impact on the network for executing each of the proposed actions; and estimate a success rate for each of the proposed actions; and a plan executor, the plan executor configured to access the plan and take one or more actions identified in the plan on one of the plurality of links to mitigate the abnormal behavior by applying one or more of the proposed actions based at least on the estimated impact and the estimated success rate. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. One or more computer-readable devices having computer-executable instructions, which when executed by at least one computer perform operations comprising:
-
monitoring a network; determining a network component set corresponding to abnormal behavior, in which the network component set comprises a plurality of links; and taking automated action on the network component set to mitigate the abnormal behavior by; identifying a plan comprising a plurality of proposed actions; estimating an impact on the network for executing each of the proposed actions; estimating a success rate for each of the proposed actions; and applying one or more of the proposed actions based at least on the estimated impact and the estimated success rate. - View Dependent Claims (18, 19, 20)
-
Specification