Failure recovery system and server
First Claim
1. A failure recovery system comprising:
- one or more network devices constituting a network; and
a server connected to said network devices, and including a memory storing a scenario table in which object to-be-monitored information that indicates said network device or devices being objects for failure recovery, failure information for identifying contents of failures, and countermeasure information against the failures, and frequency information that indicates the number of times of recoveries from the failures based on the countermeasure information are correspondingly stored;
wherein;
said network device detects the failure of said network device itself, and transmits to said server, a failure event which contains the object to-be-monitored information indicating said device itself, and the failure information for identifying content of the failure;
said server receives the failure event, searches for one or more countermeasure information items corresponding to the object to-be-monitored information and the failure information which are contained in the failure event, by reference to the scenario table, and selects from among the pertinent countermeasure information items, one countermeasure information item as to which the corresponding frequency information is the largest, or, is equal to or larger than a predetermined value;
said server transmits the selected countermeasure information item to said network device;
said network device receives the countermeasure information item, and reflects the countermeasure information item or alters its setting on the basis of the countermeasure information item; and
when said server judges that the failure event is not received again within a predetermined time period since the transmission of the selected countermeasure information item, said server increases the frequency information item corresponding to the selected countermeasure information item, by reference to the scenario table,further wherein the transmission to said server includes that;
said network device monitors logs output beforehand and containing logs of route computations, so as to evaluate the number of times route computations have occurred within a fixed time period,said network device detects the failure of frequent occurrences of route learning, because an evaluated number of times of the route computations is larger than a preset threshold value, andsaid network device transmits to said server, the failure event which contains the object to-be-monitored information indicating the device itself, and the failure information for identifying the frequent occurrences of the route learning.
1 Assignment
0 Petitions
Accused Products
Abstract
A server 200 includes a scenario table in which object to-be-monitored information that indicates one or more network devices A, B and C being objects for failure recovery, failure information for identifying contents of failures, countermeasure information against failures, and frequency information that indicates the number of times of the recoveries from the failures based on the countermeasure information are correspondingly stored. The network device A 300 detects the failure of the network device itself and transmits a failure event to the server 200. The server 200 selects the countermeasure information item in descending order of the frequency information items, by reference to the scenario table, and transmits the selected countermeasure information item to the network device A 300. The server 200 repeats the selections and transmissions of the pertinent information item until the reception of the failure event from the network device A 300 stops.
-
Citations
6 Claims
-
1. A failure recovery system comprising:
-
one or more network devices constituting a network; and a server connected to said network devices, and including a memory storing a scenario table in which object to-be-monitored information that indicates said network device or devices being objects for failure recovery, failure information for identifying contents of failures, and countermeasure information against the failures, and frequency information that indicates the number of times of recoveries from the failures based on the countermeasure information are correspondingly stored; wherein; said network device detects the failure of said network device itself, and transmits to said server, a failure event which contains the object to-be-monitored information indicating said device itself, and the failure information for identifying content of the failure; said server receives the failure event, searches for one or more countermeasure information items corresponding to the object to-be-monitored information and the failure information which are contained in the failure event, by reference to the scenario table, and selects from among the pertinent countermeasure information items, one countermeasure information item as to which the corresponding frequency information is the largest, or, is equal to or larger than a predetermined value; said server transmits the selected countermeasure information item to said network device; said network device receives the countermeasure information item, and reflects the countermeasure information item or alters its setting on the basis of the countermeasure information item; and when said server judges that the failure event is not received again within a predetermined time period since the transmission of the selected countermeasure information item, said server increases the frequency information item corresponding to the selected countermeasure information item, by reference to the scenario table, further wherein the transmission to said server includes that; said network device monitors logs output beforehand and containing logs of route computations, so as to evaluate the number of times route computations have occurred within a fixed time period, said network device detects the failure of frequent occurrences of route learning, because an evaluated number of times of the route computations is larger than a preset threshold value, and said network device transmits to said server, the failure event which contains the object to-be-monitored information indicating the device itself, and the failure information for identifying the frequent occurrences of the route learning.
-
-
2. A failure recovery system comprising:
-
a first network device constituting a network; a second network device connected to said first network device, and constituting the network; and a server connected to said first and second network devices, and including a memory storing a scenario table in which object to-be-monitored information that indicates said first and second network devices being objects for recovery from a failure, failure information that is determined by a combination of a status of said first network device and a status of said second network device, countermeasure information against the failure, and frequency information that indicates the number of times of the recoveries from the failure based on the countermeasure information are correspondingly stored; wherein; said first network device transmits to said server, a first event which contains first object to-be-monitored information indicating said device itself, and first status information indicating the status of said device itself; said second network device transmits to said server, a second event which contains second object to-be-monitored information indicating said device itself, and second status information indicating the status of said device itself; said server receives the first and second events, judges existence or nonexistence of the failure on the basis of the first status information and the second status information, and finds failure information; said server searches for one or more countermeasure information items which correspond to the first and second object to-be-monitored information items respectively contained in the first and second events, and the found failure information, by reference to the scenario table, and selects from among the pertinent countermeasure information items, one countermeasure information item as to which the corresponding frequency information is the largest, or, is equal to or larger than a predetermined value; said server transmits the selected countermeasure information item to said first and second network devices, respectively; said first and second network devices receive the countermeasure information item, and reflect said countermeasure information item or alter their settings on the basis of the countermeasure information item, respectively; and when the failure is avoided, said server increases the frequency information item corresponding to the selected countermeasure information item, by reference to the scenario table, further wherein; said first and second network devices operate in a redundant configuration; the finding of the failure information contains that, in a case where both the first status information of said first network device and the second status information of said second network device indicate masters, or where both of them indicate backups, said server judges the existence of the failure, and that said server finds the failure information indicating “
double master”
or “
double backup”
; andthe countermeasure information contains information for bringing one of said first network device and said second network device into the master, and the other into the backup.
-
-
3. A failure recovery system comprising:
-
a first network device constituting a network; a second network device constituting the network; a third network device connected to the network through said first network device, and connected to the network through said second network device; and a server connected to said first and second network devices, and including a scenario table in which object to-be-monitored information that indicates the network device or devices being objects for failure recovery, failure information for identifying contents of failures, countermeasure information against the failures, and frequency information that indicates the number of times of recoveries from the failures based on the countermeasure information are correspondingly stored; wherein; when said third network device detects that a failure has occurred in a transfer to the network, due to the failure of said first or second network device, said third network device transmits to said server, a failure event which contains the object to-be-monitored information indicating said device itself, and failure information for identifying the failure of a transfer function; said server receives the failure event, searches for one or more countermeasure information items corresponding to the object to-be-monitored information and the failure information which are contained in the failure event, by reference to the scenario table, and selects from among the pertinent countermeasure information items, one countermeasure information item as to which the corresponding frequency information is the largest, or, is equal to or larger than a predetermined value; said server transmits the countermeasure information item to said first and second network devices in conformity with the selected countermeasure information item; said first and second network devices receive the countermeasure information item, and reflect the countermeasure information item or alter their settings on the basis of the countermeasure information item; and when said server does not receive the failure event again within a predetermined time period since the transmission of the selected countermeasure information item, said server increases the frequency information corresponding to the selected countermeasure information item, by reference to the scenario table, further wherein; the first network device is set as a rendezvous point of multicast routing; the transmission to said server includes that said third network device detects occurrence of a failure in a multicast routing function of said device itself, due to a failure of a rendezvous point function of said first network device, and transmits the failure event to said server; and the countermeasure information item contains information for setting the second network device as a rendezvous point of the multicast routing.
-
-
4. A failure recovery system as defined comprising:
-
one or more network devices constituting a network; a server connected to said network devices, and storing a scenario table in which object to-be-monitored information that indicates said network device or devices being objects for failure recovery, failure information for identifying contents of failures, and a plurality of countermeasure information items against each of the failures, and frequency information that indicates the number of times of recoveries from the failures based on the countermeasure information are correspondingly stored; and a client device which is connected to said server; wherein; said network device detects the failure of said network device itself, and transmits to said server, a failure event which contains the object to-be-monitored information indicating said device itself, and the failure information for identifying content of the failure; said server receives the failure event, searches for one or more countermeasure information items corresponding to the object to-be-monitored information and the failure information which are contained in the failure event, by reference to the scenario table, and selects from among the pertinent countermeasure information items, one countermeasure information item as to which the corresponding frequency information is the largest, or, is equal to or larger than a predetermined value; said server transmits the selected countermeasure information item to said network device; said network device receives the countermeasure information item, and reflects the countermeasure information item or alters its setting on the basis of the countermeasure information item; when said server judges that the failure event is not received again within a predetermined time period since the transmission of the selected countermeasure information item, said server increases the frequency information item corresponding to the selected countermeasure information item, by reference to the scenario table; said server notifies to said client device in a case where an unselected countermeasure information item has become nonexistent by repeatedly selecting pertinent countermeasure information items in the scenario table; and said client device displays on a display unit or outputs to an output unit, nonexistence of the countermeasure information.
-
-
5. A failure recovery system comprising:
-
a first network device constituting a network; a second network device connected to said first network device, and constituting the network; a server connected to said first and second network devices and storing a scenario table in which object to-be-monitored information that indicates said first and second network devices being objects for recovery from a failure, failure information that is determined by a combination of a status of said first network device and a status of said second network device, a plurality of countermeasure information items against each of the failures, and frequency information that indicates the number of times of the recoveries from the failure based on the countermeasure information are correspondingly stored; and a client device which is connected to said server; wherein; said first network device transmits to said server, a first event which contains first object to-be-monitored information indicating said device itself, and first status information indicating the status of said device itself; said second network device transmits to said server, a second event which contains second object to-be-monitored information indicating said device itself, and second status information indicating the status of said device itself; said server receives the first and second events, judges existence or nonexistence of the failure on the basis of the first status information and the second status information, and finds failure information; said server searches for one or more countermeasure information items which correspond to the first and second object to-be-monitored information items respectively contained in the first and second events, and the found failure information, by reference to the scenario table, and selects from among the pertinent countermeasure information items, one countermeasure information item as to which the corresponding frequency information is the largest, or, is equal to or larger than a predetermined value; said server transmits the selected countermeasure information item to said first and second network devices, respectively; said first and second network devices receive the countermeasure information item, and reflect said countermeasure information item or alter their settings on the basis of the countermeasure information item, respectively; when the failure is avoided, said server increases the frequency information item corresponding to the selected countermeasure information item, by reference to the scenario table; said server notifies to said client device in a case where an unselected countermeasure information item has become nonexistent by repeatedly selecting pertinent countermeasure information items in the scenario table; and said client device displays on a display unit or outputs to an output unit, nonexistence of the countermeasure information.
-
-
6. A failure recovery system comprising:
-
a first network device constituting a network; a second network device constituting the network; a third network device connected to the network through said first network device, and connected to the network through said second network device; a server connected to said first and second network devices, and including a memory storing a scenario table in which object to-be-monitored information that indicates the network device or devices being objects for failure recovery, failure information for identifying contents of failures, a plurality of countermeasure information items against each of the failures, and frequency information that indicates the number of times of recoveries from the failures based on the countermeasure information are correspondingly stored; and a client device which is connected to said server; wherein; when said third network device detects that a failure has occurred in a transfer to the network, due to the failure of said first or second network device, said third network device transmits to said server, a failure event which contains the object to-be-monitored information indicating said device itself, and failure information for identifying the failure of a transfer function; said server receives the failure event, searches for one or more countermeasure information items corresponding to the object to-be-monitored information and the failure information which are contained in the failure event, by reference to the scenario table, and selects from among the pertinent countermeasure information items, one countermeasure information item as to which the corresponding frequency information is the largest, or, is equal to or larger than a predetermined value; said server transmits the countermeasure information item to said first and second network devices in conformity with the selected countermeasure information item; said first and second network devices receive the countermeasure information item, and reflect the countermeasure information item or alter their settings on the basis of the countermeasure information item; when said server does not receive the failure event again within a predetermined time period since the transmission of the selected countermeasure information item, said server increases the frequency information corresponding to the selected countermeasure information item, by reference to the scenario table; said server notifies to said client device in a case where an unselected countermeasure information item has become nonexistent by repeatedly selecting pertinent countermeasure information items in the scenario table; and said client device displays on a display unit or outputs to an output unit, nonexistence of the countermeasure information.
-
Specification