Functional fail-over apparatus and method of operation thereof
First Claim
Patent Images
16. A system comprising:
- a first plurality of network nodes connected via a first communication link;
a second plurality of network nodes connected via a second communication link;
said first communication link and said second communication link connected through a third communication link. a process capable of execution on one of the network nodes;
a monitor for said process capable of execution on one of the network nodes, said monitor capable of detecting failure of said process and causing said process to execute on another of the network nodes.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method for a failover system where in case of a failure of an element within the system, only that element is shutdown, rather then shutting down an entire node. The solution is of particular use in network systems, networked storage systems as well as location independent file systems.
67 Citations
72 Claims
-
16. A system comprising:
-
a first plurality of network nodes connected via a first communication link;
a second plurality of network nodes connected via a second communication link;
said first communication link and said second communication link connected through a third communication link. a process capable of execution on one of the network nodes;
a monitor for said process capable of execution on one of the network nodes, said monitor capable of detecting failure of said process and causing said process to execute on another of the network nodes. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
-
35. A method for operating a failover system, wherein failover does not require the termination of all the processes executing on a first network node, the method comprising:
-
executing a process on the first network node;
executing a first monitor on a second network node, said second network node connected to said first network node via a communications link;
periodically checking the operation of said process by said first monitor;
if an execution failure of said process is detected, then terminating execution of said process on said first network node;
transferring and initiating execution of said process on said second network node;
initiating execution of a second monitor for said process on said first network node; and
terminating said first monitor. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
-
46. A computer system adapted to controlling failover so that the termination of all the executing processes is not required, the computer system comprising:
-
a first network node and a second network node;
a memory comprising software instructions adapted to enable the computer system to perform;
executing a process on said first network node;
executing a first monitor on said second network node, said second network node connected to said first network node via a communications link;
periodically checking the operation of said process by said first monitor;
if an execution failure of said process is detected, then terminating execution of said process on said first network node;
transferring and initiating execution of said process on said second network node;
initiating execution of a second monitor for said process on said first network node; and
terminating said first monitor.
-
-
47. A computer software product for a computer system comprising a first network node and a second network node to control failover so that the termination of all the processes executing on said first network node is not required, the computer program product comprising:
software instructions for enabling the computer system to perform predetermined operations, and a computer readable medium bearing the software instructions, said predetermined operations comprising;
executing a process on said first network node;
executing a first monitor on said second network node, said second network node connected to said first network node via a communications link;
periodically checking the operation of said process by said first monitor;
if an execution failure of said process is detected, then terminating execution of said process on said first network node;
transferring and initiating execution of said process on said second network node;
initiating execution of a second monitor for said process on said first network node; and
terminating said first monitor.
-
48. A method for monitoring and performing a failover of a network node connected to a communication link, the method comprising:
-
monitoring the operation of said network node by at least two managers;
exchanging heartbeats between said two managers;
if said first manager does not receive a heartbeat from said second manager, then said first manager executes diagnostic tests to determine how to correct the failed receipt of the heartbeat from said second manager. - View Dependent Claims (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66)
-
-
62-1. The computer system of claim 61, wherein the software instructions adapted to executing diagnostic tests further are further adapted to:
-
attempt to access said second manager by said first manager;
attempt to access the operating system of said second manager by said first manager;
attempt to access a first network interface device of said second manager by said first manager; and
attempt to access a first switch of said second manager by said first manager.
-
-
67. A computer software product for monitoring and performing a failover of a network node connected to a communication link, the computer program product comprising:
software instructions for enabling the network node to perform predetermined operations, and a computer readable medium bearing the software instructions, said predetermined operations comprising;
monitoring the operation of a node in the plurality of network nodes by at least two managers;
exchanging heartbeats between said two managers;
if said first manager does not receive a heartbeat from said second manager, then said first manager executes diagnostic tests to determine how to correct the failed receipt of the heartbeat from said second manager. - View Dependent Claims (68, 69, 70, 71, 72)
Specification