Distributed network monitoring system for monitoring node and link status
First Claim
1. A distributed network monitoring method for monitoring the status of network nodes and communication links, comprising the steps:
- at predetermined monitoring intervals, dispatching a circulating status table (CST) from a node designated as a dispatching node to other nodes that are on-line;
circulating the CST to each on-line node, and then returning the CST to the dispatching node; and
at each node that receives the CST, writing selected status information about such node into the CST and reading selected status information about the other nodes.
1 Assignment
0 Petitions
Accused Products
Abstract
A distributed network monitor system distributes the network monitoring function among each of the nodes of a multiple network system, such that monitor software resident in each node is responsible for providing status information about that node and its communications links. At predetermined monitoring intervals, a circulating status table (CST) (FIG. 4) is circulated to all of the on-line nodes, with each node updating the CST with its link and status information. The monitor software (FIG. 1) includes a servicer task (22), a node monitor task (24), and a packet manager task (26), with intertask data transfers being implemented through a monitor region (28) in memory. In addition to link and node status information, the CST includes information about links that are in an intermittent condition (i.e., links with significantly degraded statistical performance). Intermittent link conditions are determined by a voting procedure in which each node votes on the condition of its links with other nodes, with the results of the votes being distributed in the CST and used by each node to determine those nodes with links in an intermittent condition. For those links without sufficient message traffic to make a clear determination of condition, volunteer nodes send additional link test messages until sufficient statistical information is available.
-
Citations
33 Claims
-
1. A distributed network monitoring method for monitoring the status of network nodes and communication links, comprising the steps:
-
at predetermined monitoring intervals, dispatching a circulating status table (CST) from a node designated as a dispatching node to other nodes that are on-line; circulating the CST to each on-line node, and then returning the CST to the dispatching node; and at each node that receives the CST, writing selected status information about such node into the CST and reading selected status information about the other nodes. - View Dependent Claims (2, 3, 4)
-
-
5. A distributed network monitoring method for monitoring the status of network nodes and communication links, comprising the steps:
-
at predetermined monitoring intervals, dispatching a circulating status table (CST) from a node designated as a dispatching node to other nodes that are on-line; circulating the CST to each on-line node, and then returning the CST to the dispatching node;
wherein the return of the CST to the dispatching node is accomplished by the step of;at a last on-line node, indicating in the CST that the dispatching node has not received the CST, and passing the CST, such that it circulates back to the dispatching node; and at each node that receives the CST, writing selected status information about such node into the CST and reading selected status information about the other nodes.
-
-
6. A distributed network monitoring method for monitoring the status of network nodes and communication links, comprising the steps:
-
at predetermined monitoring intervals, dispatching a circulating status table (CST) from a node designated as a dispatching node to other nodes that are on-line; circulating the CST to each on-line node, and then returning the CST to the dispatching node; and at each node that receives the CST, writing selected status information about such node into the CST and reading selected status information about the other nodes;
wherein the CST indicates whether each node is on-line or off-line, further comprising the steps;if a node fails in an attempt to pass the CST to an on-line node, indicating in the CST such failure and passing the CST to another node and when the CST returns to the dispatching node, indicating in the CST that any previously on-line node that has not received the CST is off-line.
-
-
7. A distributed network monitoring method for monitoring the status of network nodes and communication links, comprising the steps:
-
at predetermined monitoring intervals, dispatching a circulating status table (CST) from a node designated as a dispatching node to other nodes that are on-line; circulating the CST to each on-line node, and then returning the CST to the dispatching node; and at each node that receives the CST, writing selected status information about such node into the CST and reading selected status information about the other nodes;
wherein the CST indicates whether each node is on-line or off-line, further comprising the steps of;at predetermined polling intervals, polling off-line nodes to determine if they are no longer off-line and for each polled node discovered to be on-line, indicating in the CST that such node is on-line. - View Dependent Claims (8)
-
-
9. A distributed network monitoring method for monitoring the status of network nodes and communication links, comprising the steps:
-
at predetermined monitoring intervals, dispatching a circulating status table (CST) from a node designated as a dispatching node to other nodes that are on-line; circulating the CST to each on-line node, and then returning the CST to the dispatching node; and at each node that receives the CST, writing selected status information about such node into the CST and reading selected status information about the other nodes;
wherein the network nodes are on multiple networks with at least one bridge node between each network, further comprising the steps of;if each bridge node between two networks goes off-line, designating a dispatching node for each such network; and circulating a respective CST among the on-line nodes of each network. - View Dependent Claims (10)
-
-
11. A distributed network monitoring method for monitoring the status of network nodes and communication links, comprising the steps:
-
at predetermined monitoring intervals, dispatching a circulating status table (CST) from a node designated as a dispatching node to other nodes that are on-line; circulating the CST to each on-line node, and then returning the CST to the dispatching node; and at each node that receives the CST, writing selected status information about such node into the CST and reading selected status information about the other nodes; said selected status information associated with each node comprises link status information about the status of links associated with the node, and node status information about the status of the node, wherein the link status information comprises an indication of whether a link is up or down, and the node status information comprises an indication of whether the node is on-line or off-line.
-
-
12. A distributed network monitoring method for monitoring the status of network nodes and communication links, comprising the steps:
-
at predetermined monitoring intervals, dispatching a circulating status table (CST) from a node designated as a dispatching node to other nodes that are on-line; circulating the CST to each on-line node, and then returning the CST to the dispatching node; and at each node that receives the CST, writing selected status information about such node into the CST and reading selected status information about the other nodes;
wherein the CST indicates that, for each node, the condition of a link is intermittent if the performance of that link is significantly degraded. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A distributed network monitoring method for monitoring the status of network nodes and communication links in a network system of multiple networks with at least one bridge node between each network, comprising the steps:
-
at predetermined monitoring intervals, dispatching a circulating status table (CST) from a node designated as a dispatching node to other nodes that are on-line, and then returning the CST to the dispatching node; at each on-line node with the CST, passing the CST to another on-line node that has not received the CST and for which the passing node has a link; or if all on-line nodes for which the node passing the CST has a link have received the CST, passing the CST to an on-line node that can pass the CST to an on-line node that has not received the CST; at each node that receives the CST, writing selected status information about such node into the CST and reading selected status information about the other nodes.
-
-
19. A distributed network monitoring method for monitoring the status of network nodes and communication links in a network system of multiple networks with at least one bridge node between each network, comprising the steps:
-
at predetermined monitoring intervals, dispatching a circulating status table (CST) from a node designated as a dispatching node to other nodes that are on-line, and then returning the CST to the dispatching node; at each on-line node with the CST, passing the CST to another on-line node that has not received the CST and for which a node passing the CST has a link; or if all on-line nodes for which a passing node passing the CST has a link have received the CST, passing the CST to an on-line node that can pass the CST to an on-line node that has not received the CST; at each node that receives the CST, writing selected status information about such node into the CST and reading selected status information about the other nodes;
wherein the CST indicates whether each node is on-line or off-line, and wherein each node is assigned a unique identification number, further comprising the steps of;at predetermined polling intervals, polling each off-line node with an identification number that falls between the identification number of the polling node and the on-line node with the next higher identification number to determine if such polled nodes are no longer off-line; and for each polled node discovered to be on-line, indicating in the CST that such node is on-line.
-
-
20. A distributed network monitoring method for monitoring the status of network nodes and communication links in a network system of multiple networks with at least one bridge node between each network, comprising the steps:
-
at predetermined monitoring intervals, dispatching a circulating status table (CST) from a node designated as a dispatching node to other nodes that are on-line, and then returning the CST to the dispatching node; at each on-line node with the CST, passing the CST to another on-line node that has not received the CST and for which a node passing the CST has a link; or if all on-line nodes for which the node passing the CST has a link have received the CST, passing the CST to an on-line node that can pass the CST to an on-line node that has not received the CST; at each node that receives the CST, writing selected status information about such node into the CST and reading selected status information about the other nodes; if each bridge node between two networks goes off-line, designating a dispatching node for each such network; and circulating a respective CST among the on-line nodes of each network; and
thenwhen an off-line bridge node is brought back on-line, designating one of the respective dispatching nodes as a new dispatching node, and eliminating any CST not dispatched by such new dispatching node.
-
-
21. A distributed network monitoring method for monitoring the status of network nodes and communication links in a network system of multiple networks with at least one bridge node between each network, comprising the steps:
-
at predetermined monitoring intervals, dispatching a circulating status table (CST) from a node designated as a dispatching node to other nodes that are on-line, and then returning the CST to the dispatching node; at each on-line node with the CST, passing the CST to another on-line node that has not received the CST and for which a node passing the CST has a link; or if all on-line nodes for which the node passing the CST has a link have received the CST, passing the CST to an on-line node that can pass the CST to an on-line node that has not received the CST; at each node that receives the CST, writing selected status information about such node into the CST and reading selected status information about the other nodes;
wherein the CST indicates that, for each node, the condition of a link is intermittent if the performance of that link is significantly degraded, and wherein intermittent link conditions are determined by a voting procedure comprising the steps;at each node, logging communications attempts and errors that occur between such node and the other nodes; at predetermined voting intervals, circulating from the dispatching node a request for each on-line node to vote on the condition of its link with other on-line nodes; at each on-line node, responding to the voting request by voting on whether its link to another on-line node is intermittent according to predetermined statistical criteria based on such node'"'"'s log of attempts/errors over such link, but only if such on-line node has had a predetermined minimum number of communication attempts, otherwise indicating no opinion in response to a voting request; and compiling the votes from each of the on-line nodes, and assigning an intermittent link condition to each link having a predetermined minimum number of intermittent link votes. - View Dependent Claims (22)
-
-
23. A distributed network monitoring software system for monitoring the status of network nodes and communication links, where each node includes communications driver software for communicating message packets among the nodes of the network, comprising:
-
at each node, monitor software including at least a servicer task and a node monitor task, and further including a monitor region of memory for implementing intertask data transfers; a circulating status table (CST) including selected status information about each node and the associated communications links; at each node, said node monitor task continuously sends link test packets to other on-line nodes for testing the status of such other nodes and associated communications links, and provides corresponding status data into said monitor region; at predetermined monitoring intervals, said servicer task of a node designated as a dispatching node circulates the CST to each on-line node, such that the CST returns to the dispatching node after being passed to each on-line node; at each node that receives the CST, said servicer task reads selected status information from said monitor region and correspondingly updates the status information in the CST, and reads selected status information about the other nodes from the CST.
-
-
24. A distributed network monitoring software system for monitoring the status of network nodes and communication links, where each node includes communications driver software for communicating message packets among the nodes of the network, comprising:
-
at each node, monitor software including at least a CST servicer task and a node monitor task, and further including a monitor region of memory for implementing intertask data transfers; a circulating status table (CST) including selected status information about each node and the associated communications links; at each node, said node monitor task continuously sends link test packets to other on-line nodes for testing the status of such other nodes and associated communications links, and provides corresponding status data into said monitor region; at predetermined monitoring intervals, said CST servicer task of a node designated as a dispatching node circulates the CST to each on-line node, such that the CST returns to the dispatching node after being passed to each on-line node; at each node that receives the CST, said CST servicer task reads selected status information from said monitor region and correspondingly updates the status information in the CST, and reads selected status information about the other nodes from the CST;
wherein the network nodes are on multiple networks with at least one bridge node between each network, and wherein CST circulation is accomplished by;at each on-line node with the CST, said CST servicer task passes the CST to another on-line node that has not received the CST and for which the passing node has a link; or if all on-line nodes for which the node passing the CST has a link have received the CST, said CST servicer task passes the CST to an on-line node that can pass the CST to an on-line node that has not received the CST. - View Dependent Claims (25)
-
-
26. A distributed network monitoring software system for monitoring the status of network nodes and communication links, where each node includes communications driver software for communicating message packets among the nodes of the network, comprising:
-
at each node, monitor software including at least a CST servicer task and a node monitor task, and further including a monitor region of memory for implementing intertask data transfers; a circulating status table (CST) including selected status information about each node and the associated communications links; at each node, said node monitor task continuously sends link test packets to other on-line nodes for testing the status of such other nodes and associated communications links, and provides corresponding status data into said monitor region; at predetermined monitoring intervals, said CST servicer task of a node designated as a dispatching node circulates the CST to each on-line node, such that the CST returns to the dispatching node after being passed to each on-line node; at each node that receives the CST, said CST servicer task reads selected status information from said monitor region and correspondingly updates the status information in the CST, and reads selected status information about the other nodes from the CST, wherein the CST indicates whether each node is on-line or off-line, and wherein; at predetermined polling intervals, said node monitor task polls off-line nodes to determine if they are no longer off-line, and, for each polled node discovered to be on-line, indicates in the monitor region that such node is on-line.
-
-
27. A distributed network monitoring software system for monitoring the status of network nodes and communication links, where each node includes communications driver software for communicating message packets among the nodes of the network, comprising:
-
at each node, monitor software including at least a CST servicer task and a node monitor task, and further including a monitor region of memory for implementing intertask data transfers; a circulating status table (CST) including selected status information about each node and the associated communications links; at each node, said node monitor task continuously sends link test packets to other on-line nodes for testing the status of such other nodes and associated communications links, and provides corresponding status data into said monitor region; at predetermined monitoring intervals, said CST servicer task of a node designated as a dispatching node circulates the CST to each on-line node, such that the CST returns to the dispatching node after being passed to each on-line node; at each node that receives the CST, said CST servicer task reads selected status information from said monitor region and correspondingly updates the status information in the CST, and reads selected status information about the other nodes from the CST;
wherein the CST indicates that, for each node, the condition of a link is intermittent if the performance of that link is significantly degraded, wherein the communications driver at each node logs communications attempts and errors that occur between such node and the other nodes, and wherein;at predetermined voting intervals, said CST servicer task at the dispatching node circulates a request for each on-line node to vote on the condition of its link with other on-line nodes; at each on-line node, said CST servicer task responds by voting on whether its link to another on-line node is intermittent according to predetermined statistical criteria based on such node'"'"'s log of attempts/errors over such link; and said CST servicer task at each on-line node comprises the votes from each of the on-line nodes, and assigns an intermittent link condition to each link having a predetermined minimum number of intermittent link votes. - View Dependent Claims (28, 29)
-
-
30. A link-condition voting method for determining when the condition of a link between nodes of a network system is intermittent because the performance of that link is significantly degraded, comprising the steps:
-
at each node, logging communications attempts and errors that occur between such node and the other nodes; at predetermined voting intervals, circulating from a dispatching node a request for each on-line node to vote on the condition of its link with other on-line nodes; at each on-line node, responding to the voting request by voting on whether its link to another on-line node is intermittent according to predetermined statistical criteria based on such node'"'"'s log of attempts/errors over such link; and compiling the votes from each of the on-line nodes, and assigning an intermittent link condition to each link having a predetermined minimum number of intermittent link votes. - View Dependent Claims (31)
-
-
32. A link-condition voting method for determining when the condition of a link between nodes of a network system is intermittent because the performance of that link is significantly degraded, comprising the steps:
-
at each node, logging communications attempts and errors that occur between such node and the other nodes; at predetermined voting intervals, circulating from a dispatching node a request for each on-line node to vote on the condition of its link with other on-line nodes; at each on-line node, responding to the voting request by voting on whether its link to another on-line node is intermittent according to predetermined statistical criteria based on such node'"'"'s log of attempts/errors over such link; and compiling the votes from each of the on-line nodes, and assigning an intermittent link condition to each link having a predetermined minimum number of intermittent link votes; wherein each on-line node only votes on those links to other on-line nodes for which such node has had a predetermined minimum number of communication attempts, otherwise indicating no opinion in response to a voting request; for each on-line node, assigning an indeterminate condition to those links to other on-line nodes for which not enough votes are available to make a determination about whether such link should be assigned an intermittent link condition; and at each of a selected number of on-line volunteer nodes able to communicate with another on-line node over an indeterminate condition link, sending a predetermined number of test messages to such other node over such indeterminate condition link such that the volunteer node is able to vote on the condition of such link during a subsequent voting interval.
-
-
33. A link-condition voting method for determining when the condition of a link between nodes of a network system is intermittent because the performance of that link is significantly degraded, comprising the steps:
-
at each node, logging communications attempts and errors that occur between such node and the other nodes; at predetermined voting intervals, circulating from a dispatching node a request for each on-line node to vote on the condition of its link with other on-line nodes; at each on-line node, responding to the voting request by voting on whether its link to another on-line node is intermittent according to predetermined statistical criteria based on such node'"'"'s log of attempts/errors over such link; and compiling the votes from each of the on-line nodes, and assigning an intermittent link condition to each link having a predetermined minimum number of intermittent link votes; and at an on-line node with an indeterminate condition link, responding to a voting request by forcing the vote on that link to indicate an intermittent condition until such node is able to communicate over such link with a predetermined minimum number of errors over a voting interval.
-
Specification