Detection of failures in network devices
First Claim
Patent Images
1. A method of monitoring network devices for failures, the method comprising:
- transmitting packets in a network using a network monitoring agent executing on a server computer;
determining that multiple packets transmitted by the network monitoring agent did not reach respective destination addresses of the multiple packets;
transmitting traceroute packets in order to determine why the multiple packets did not reach the destination addresses;
determining a set of frequencies of last hops associated with the traceroute packets that did not reach the destination addresses;
clustering the set of frequencies into first and second groups, wherein the first group is a high-frequency group of last hops and the second group is a lower-frequency group of last hops;
comparing a most frequent last hop in the first group with another last hop in the second group within the set of frequencies; and
determining a network device associated with the most frequent last hop is defective when its frequency exceeds the other last hop frequencies by a predetermined amount; and
re-routing network traffic around the network device.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment, detection of failures in a network device can be achieved by obtaining a last hop of partial traceroutes. For example, the last hops of the partial traceroutes detected in a time period can be sorted by frequency of occurrence. Multiple clusters can be generated, one with last hops that are most frequent and at least one other with the last hops that are less frequent. The hop with the highest frequency and the last hop with the second highest frequency can be compared. If the last hop with the highest frequency exceeds the second highest frequency by a predetermined amount, then the last hop with the highest frequency is alarmable.
-
Citations
20 Claims
-
1. A method of monitoring network devices for failures, the method comprising:
-
transmitting packets in a network using a network monitoring agent executing on a server computer; determining that multiple packets transmitted by the network monitoring agent did not reach respective destination addresses of the multiple packets; transmitting traceroute packets in order to determine why the multiple packets did not reach the destination addresses; determining a set of frequencies of last hops associated with the traceroute packets that did not reach the destination addresses; clustering the set of frequencies into first and second groups, wherein the first group is a high-frequency group of last hops and the second group is a lower-frequency group of last hops; comparing a most frequent last hop in the first group with another last hop in the second group within the set of frequencies; and determining a network device associated with the most frequent last hop is defective when its frequency exceeds the other last hop frequencies by a predetermined amount; and re-routing network traffic around the network device. - View Dependent Claims (2, 3, 4, 5)
-
-
6. One or more computing devices, comprising:
-
one or more processing units; and one or more network interfaces; wherein the one or more computing devices are configured to perform operations for monitoring a plurality of network devices in a computer network, the operations comprising; transmitting traceroutes through the plurality of network devices; for traceroutes that do not reach a destination, performing frequency analysis on last hops associated with the traceroutes, wherein the frequency analysis includes determining a set of frequencies of last hops associated with the traceroutes that did not reach the destination, clustering the set of frequencies into first and second groups of different frequencies and comparing a most frequent last hop in the first group with another last hop in the second group within the set of frequencies; and determining that at least one network device is defective based on the frequency analysis including that the most frequent last hop in the first group exceeds a frequency of the last hop in the second group by a predetermined amount. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable storage medium, which is non-transitory, including instructions that upon execution cause a computer system to:
-
transmit traceroutes through a network fabric; determine partial traceroutes; determine last hops associated with the partial traceroutes; calculate a frequency associated with each of the last hops; clustering the frequency associated with each of the last hops into first and second groups including a high-frequency group and low-frequency group; and determine whether a network device within the network fabric is defective based on the frequency by comparing a most frequent last hop in the first group with another last hop in the second group. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification