Method and system for asymmetrically maintaining system operability
First Claim
1. An asymmetric failure detection system, comprising:
- a first processing element operable to perform at least one function; and
a second processing element coupled to the first processing element by a communication link, the second processing element operable to perform the at least one function of the first processing element in the event the first processing element fails, the second processing element further operable to expect and receive keepalive inquiries at an expected rate from the first processing element and to send responses in response to the inquiries to the first processing element, the second processing element further operable to take remedial action after not receiving any inquiries within a first predetermined time period; and
wherein the first processing element is operable to take remedial action after not receiving any response to any inquiries sent within a second predetermined time period, the first predetermined time period being larger than the second predetermined time period, and wherein the first predetermined time period is determined based on an amount of time needed by the first processing element to take remedial action after not receiving any response to inquiries, the first predetermined time period and the second predetermined time period selected to prevent simultaneous shutdown of both the first processing element and the second processing element in the event of a failure of the communication link.
1 Assignment
0 Petitions
Accused Products
Abstract
A system is provided for asymmetrically maintaining system operability that includes a first processing element and a second processing element coupled to the first processing element by a communication link. The first processing element is operable to perform at least one function. The second processing element is operable to perform at least one function of the first processing element in the event the first processing element fails, and further operable to expect and receive keepalive inquiries at an expected rate from the first processing element and to send responses in response to the inquiries to the first processing element. The second processing element is further operable to take remedial action after not receiving any inquiries within a first predetermined time period. In another embodiment of the present invention, the first processing element is operable to take remedial action after not receiving any response to any inquiries sent within a second predetermined time period, wherein the first predetermined time period is larger than the second predetermined time period. In other embodiments of the present invention, the first and second processing elements are routers.
-
Citations
71 Claims
-
1. An asymmetric failure detection system, comprising:
-
a first processing element operable to perform at least one function; and
a second processing element coupled to the first processing element by a communication link, the second processing element operable to perform the at least one function of the first processing element in the event the first processing element fails, the second processing element further operable to expect and receive keepalive inquiries at an expected rate from the first processing element and to send responses in response to the inquiries to the first processing element, the second processing element further operable to take remedial action after not receiving any inquiries within a first predetermined time period; and
wherein the first processing element is operable to take remedial action after not receiving any response to any inquiries sent within a second predetermined time period, the first predetermined time period being larger than the second predetermined time period, and wherein the first predetermined time period is determined based on an amount of time needed by the first processing element to take remedial action after not receiving any response to inquiries, the first predetermined time period and the second predetermined time period selected to prevent simultaneous shutdown of both the first processing element and the second processing element in the event of a failure of the communication link. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An asymmetric failure detection system, comprising:
-
a first processing element, operable to transmit keepalive inquiries at an expected rate and to receive responses;
a second processing element coupled to the first processing element by a communication link, the second processing element operable to expect and receive the inquiries at the expected rate and to send the responses, the second processing element further operable to take remedial action after not receiving any inquiries within a first predetermined time period; and
wherein the remedial action includes transmitting a message operable to disrupt the operation of the other processing element and wherein the first processing element is operable to take remedial action after not receiving any response to any inquiries sent within a second predetermined time period, the first predetermined time being period larger than the second predetermined time period, and wherein the first predetermined time period is determined based on an amount of time needed by the first processing element to take remedial action after not receiving any response to inquiries, the first predetermined time period and the second predetermined time period selected to prevent simultaneous shutdown of both the first processing element and the second processing element in the event of a failure of the communication link. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A method for asymmetric failure detection, comprising:
-
providing a first processing element operable to transmit keepalive inquiries at an expected rate and receive responses in response to the transmitted inquiries;
receiving the inquiries at a second processing element coupled to the first processing element at the expected rate;
sending the responses in response to the transmitted inquiries;
taking remedial action by the second processing element after failing to receive any inquiries within a first predetermined time period; and
wherein taking remedial action includes transmitting a message operable to disrupt the operation of the other processing element; and
wherein the first processing element is operable to take remedial action after not receiving any response to any inquiries sent within a second predetermined time period, the first predetermined time period being larger than the second predetermined time period, and wherein the first predetermined time period is determined based on an amount of time needed for the first processing element to take remedial action, the first predetermined time period and the second predetermined time period selected to prevent simultaneous shutdown of both the first processing element and the second processing element in the event of a failure of the communication link. - View Dependent Claims (15, 16, 17, 18, 19, 22)
-
-
20. An asymmetric failure detector, comprising:
-
a computer-readable storage medium; and
an asymmetric failure detector resident on the computer-readable storage medium and operable to;
provide a first processing element operable to transmit keepalive inquiries at an expected rate and receive responses in response to the transmitted inquiries;
receive the inquiries at a second processing element coupled to the first processing element at the expected rate;
send the responses in response to the transmitted inquiries;
take remedial action by the second processing element after not receiving any inquiries within a first predetermined time period; and
wherein taking remedial action includes transmitting a message operable to disrupt the operation of the other processing element; and
wherein the first processing element is operable to take remedial action after not receiving any response to any inquiries sent within a second predetermined time period, the first predetermined time period being larger than the second predetermined time period, and wherein the first predetermined time period is determined based on an amount of time needed for the first processing element to take remedial action, the first predetermined time period and the second predetermined time period selected to prevent simultaneous shutdown of both the first processing element and the second processing element in the event of a failure of the communication link. - View Dependent Claims (21, 23, 24, 25)
-
-
26. An asymmetric failure detection system, comprising:
-
a first processing element operable to perform at least one function of a second processing element in the event the second processing element fails; and
logic residing within the first processing element operable to expect and to receive keepalive inquiries at an expected rate from the second processing element and to send responses in response to the inquiries to the second processing element, the logic further operable to cause remedial action to be taken after not receiving any inquiries within a first predetermined time period; and
wherein the first predetermined time period is larger than a second predetermined time period after which operation of the first processing element is disrupted if no responses to any inquiries are sent within the second predetermined time period; and
wherein the first predetermined time period is determined based on an amount of time needed for the first processing element to cause remedial action to be taken, the first predetermined time period and the second predetermined time period selected to prevent simultaneous shutdown of both the first processing element and the second processing element in the event of a failure of the communication link.- View Dependent Claims (27, 28, 29, 30, 31)
-
-
32. An asymmetric failure detection system, comprising:
-
a first processing element;
logic within the first processing element operable to expect and to receive keepalive inquiries at an expected rate from a second processing element and to send responses to the inquiries, the first processing element further operable to cause remedial action to be taken after not receiving any inquiries within a first predetermined time period; and
wherein the remedial action includes transmitting a message operable to disrupt the operation of the other processing element; and
wherein the first predetermined time period is larger than a second predetermined time period after which operation of the first processing element is disrupted if no responses to any inquiries are sent within the second predetermined time period, and wherein the first predetermined time period is based on an amount of time needed for the first processing element to cause remedial action to be taken, the first predetermined time period and the second predetermined time period selected to prevent simultaneous shutdown of both the first processing element and the second processing element in the event of a failure of the communication link. - View Dependent Claims (33, 34, 35, 36)
-
-
37. A method for asymmetric failure detection, comprising:
-
receiving keepalive inquiries at an expected rate at a first processing element from a second processing element;
sending responses in response to the transmitted inquiries;
taking remedial action by the first processing element after failing to receive any inquiries within a first predetermined time period, wherein the first predetermined time period is based on an amount of time needed for the second processing element to take remedial action; and
wherein taking remedial action includes transmitting a message operable to disrupt the operation of the other processing element; and
disrupting the operation of the first processing element after a second predetermined time period that is smaller than the first predetermined time period if no responses to any inquiries are sent within the second predetermined time period, the first predetermined time period and the second predetermined time period selected to prevent simultaneous shutdown of both the first processing element and the second processing element in the event of a failure of the communication link. - View Dependent Claims (38, 39, 40, 41)
-
-
42. An asymmetric failure detector, comprising:
-
a computer-readable storage medium; and
an asymmetric failure detector resident on the computer-readable storage medium and operable to;
receive keepalive inquiries at an expected rate at first processing element from a second processing element;
send responses in response to the transmitted inquiries;
take remedial action by the first processing element after not receiving any inquiries within a first predetermined time period; and
wherein taking remedial action includes causing a message operable to disrupt the operation of the other processing element to be transmitted; and
wherein the first predetermined time period is larger than a second predetermined time period after which operation of the first processing element is disrupted if no responses to any inquiries are sent within the second predetermined time period; and
wherein the first predetermined time period is determined based on an amount of time needed to disrupt the operation of the first processing element, the first predetermined time period and the second predetermined time period selected to prevent simultaneous shutdown of both the first processing element and the second processing element in the event of a failure of the communication link. - View Dependent Claims (43, 44, 45, 46)
-
-
47. An asymmetric failure detection system, comprising:
-
a first processing element operable to perform at least one function of a second processing element in the event the second processing element fails; and
logic residing within the first processing element operable to transmit keepalive inquiries to and to receive responses to the inquiries from the second processing element, the logic further operable to cause remedial action to be taken after not receiving any responses within a first predetermined time period; and
wherein the first predetermined time period is smaller than a second predetermined time period after which operation of the first processing element is disrupted if no inquiries are sent within the second predetermined time period and wherein the first predetermined time period is smaller than a second predetermined time period, and wherein the second predetermined time period is based on an amount of time needed for the first processing element to cause remedial action to be taken, the first predetermined time period and the second predetermined time period selected to prevent simultaneous shutdown of both the first processing element and the second processing element in the event of a failure of the communication link. - View Dependent Claims (48, 49, 50, 51, 52, 53)
-
-
54. An asymmetric failure detection system, comprising:
-
a first processing element;
logic residing within the first processing element operable to transmit keepalive inquiries to and to receive responses to the inquiries from a second processing element, the logic further operable to cause remedial action to be taken after not receiving any responses within a first predetermined time period;
wherein the first predetermined time period is smaller than a second predetermined time period after which operation of the first processing element is disrupted if no inquiries are sent within the second predetermined time period and wherein the second predetermined time period is based on an amount of time needed for the first processing element to cause remedial action to be taken, the first predetermined time period and the second predetermined time period selected to prevent simultaneous shutdown of both the first processing element and the second processing element in the event of a failure of the communication link; and
wherein the remedial action includes transmitting a message operable to disrupt the operation of the other processing element. - View Dependent Claims (55, 56, 57, 58, 59)
-
-
60. A method for asymmetric failure detection, comprising:
-
transmitting keepalive inquiries from a first processing element to a second processing element;
receiving responses in response to the transmitted inquiries;
taking remedial action by the first processing element after failing to receive any responses within a first predetermined time period;
wherein the first predetermined time period is smaller than a second predetermined time period after which operation of the first processing element is disrupted if no inquiries are sent within the second predetermined time period and wherein the second predetermined time period is based on an amount of time needed for the first processing element to take remedial action, the first predetermined time period and the second predetermined time period selected to prevent simultaneous shutdown of both the first processing element and the second processing element in the event of a failure of the communication link; and
wherein taking remedial action includes transmitting a message operable to disrupt the operation of the other processing element. - View Dependent Claims (61, 62, 63, 64, 65)
-
-
66. An asymmetric failure detector, comprising:
-
a computer-readable storage medium; and
an asymmetric failure detector resident on the computer-readable storage medium and operable to;
transmit keepalive inquiries from a first processing element to a second processing element;
receive responses in response to the transmitted inquiries;
take remedial action by the first processing element after not receiving any responses within a first predetermined time period;
wherein the first predetermined time period is smaller than a second predetermined time period after which operation of the first processing element is disrupted if no inquiries are sent within the second predetermined time period; and
wherein the second predetermined time period is based on an amount of time needed for the first processing element to take remedial action, the first predetermined time period and the second predetermined time period selected to prevent simultaneous shutdown of both the first processing element and the second processing element in the event of a failure of the communication link; and
wherein taking remedial action includes causing a message operable to disrupt the operation of the other processing element to be transmitted. - View Dependent Claims (67, 68, 69, 70, 71)
-
Specification