Proactive method for ensuring availability in a clustered system
First Claim
1. In a computer system including at least two server nodes, each of which can execute clustered server software, a method for monitoring failure situations to reduce downtime, said method comprising the steps of:
- (a) detecting an event causing one of said failure situations;
(b) determining if said event affects one of said server nodes, and if so;
(c) determining if said event exceeds a threshold value, and if so;
(d) executing a proactive failover;
(e) determining if said event does not affect one of said server nodes, and if so;
(f) determining if said event affects the condition of the cluster service, and if so;
(g) identifying and initiating an appropriate action to fix said condition or provide a workaround that will preempt an impending failure of the cluster system, or restart a failed cluster system.
12 Assignments
0 Petitions
Accused Products
Abstract
The method of the present invention is useful in a computer system including at least two server nodes, each of which can execute clustered server software. The program executes a method for monitoring failure situations to reduce downtime. The method includes the step of detecting an event causing one of the failure situations, and then the method determines if the event affects one of the server nodes. If it is determined the event does affect one of the server nodes, the method then determines if the event exceeds a threshold value. If it is determined the event exceeds a threshold value, the method executes a proactive failover. If the event is not specific to a cluster node, but indicates an impending or actual failure of the cluster software, the method identifies and initiates an appropriate action to fix the condition or provide a workaround (if available) that will preempt an impending failure of the cluster system or would enable a restarting of a failed cluster software.
-
Citations
9 Claims
-
1. In a computer system including at least two server nodes, each of which can execute clustered server software, a method for monitoring failure situations to reduce downtime, said method comprising the steps of:
-
(a) detecting an event causing one of said failure situations; (b) determining if said event affects one of said server nodes, and if so; (c) determining if said event exceeds a threshold value, and if so; (d) executing a proactive failover; (e) determining if said event does not affect one of said server nodes, and if so; (f) determining if said event affects the condition of the cluster service, and if so; (g) identifying and initiating an appropriate action to fix said condition or provide a workaround that will preempt an impending failure of the cluster system, or restart a failed cluster system. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. In a computer system including at least two server nodes, each of which can execute clustered server software, a method for monitoring failure situations to reduce downtime, said method comprising the steps of:
(a) detecting an event causing one of said failure situations said detecting including the steps of; (a1) listening for a Simple Network Management Protocol (SNMP) event; (a2) listening for an event log event;
wherein said step (a2) for listening for said event log event further includes the steps of;(a2a) opening a connection to a Windows Management Instrumentation (WMI) service; (a2b) subscribing to receive event log messages from said WMI service; (b) determining if said event affects one of said server nodes, and if so; (c) determining if said event exceeds a threshold value, and if so; (d) executing a proactive failover. - View Dependent Claims (8, 9)
Specification