Functional fail-over apparatus and method of operation thereof
First Claim
Patent Images
1. A system comprising:
- a first network node and a second network node connected via a communication link;
a plurality of processes resident on said first network node and presently executing;
a plurality of monitors for said plurality of processes, each of said plurality of processes having one corresponding monitor from said plurality of monitors, said corresponding monitors resident on said second network node and presently executing, each monitor being configured for detecting failure of its corresponding process on said first network node and, if its corresponding process has failed, causing said failed process to execute on said second network node as a new process having a corresponding monitor executing on said first network node, while the remaining processes continue to execute on said first network node.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method for a failover system where in case of a failure of an element within the system, only that element is shutdown, rather then shutting down an entire node. The solution is of particular use in network systems, networked storage systems as well as location independent file systems.
88 Citations
70 Claims
-
1. A system comprising:
-
a first network node and a second network node connected via a communication link;
a plurality of processes resident on said first network node and presently executing;
a plurality of monitors for said plurality of processes, each of said plurality of processes having one corresponding monitor from said plurality of monitors, said corresponding monitors resident on said second network node and presently executing, each monitor being configured for detecting failure of its corresponding process on said first network node and, if its corresponding process has failed, causing said failed process to execute on said second network node as a new process having a corresponding monitor executing on said first network node, while the remaining processes continue to execute on said first network node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system comprising:
-
a first plurality of network nodes connected via a first communication link;
a second plurality of network nodes connected via a second communication link;
said first communication link and said second communication link connected through a third communication link, a plurality of processes resident on one of the network nodes and presently executing;
a plurality of monitors for said plurality of processes, each of said plurality of processes having one corresponding monitor from said plurality of monitors, said corresponding monitors resident on another one of the network nodes and presently executing, each corresponding monitor being configured for detecting failure of its corresponding process and, if its corresponding process has failed, causing said failed process to execute on another of the network nodes as a new process having a corresponding monitor executing on said one network node while the remaining processes continue to execute on said one network node. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
-
35. A method for operating a failover system, wherein failover does not require the termination of all the processes executing on a first network node, the method comprising:
-
executing a plurality of processes resident on the first network node;
executing a plurality of monitors for said plurality of processes, each of said plurality of processes having one corresponding monitor from said plurality of monitors resident on a second network node, said second network node connected to said first network node via a communications link;
periodically checking the operation of said plurality of processes, said periodic checking performed by said plurality of corresponding monitors;
if an execution failure of a process of said plurality of processes is detected by its corresponding monitor, then terminating execution of said failed process from executing on said first network node while the remaining processes continue to execute on said first network node;
transferring and initiating execution of said failed process as a new process on said second network node;
initiating execution of a new corresponding monitor for said new process on said first network node; and
terminating said corresponding monitor of said failed monitor from executing on said first network node. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
-
46. A computer system for controlling failover so that the termination of all the executing processes is not required, the computer system comprising:
-
a first network node and a second network node;
a memory comprising software instructions that enable the computer system to perform;
executing a plurality of processes resident on the first network node;
executing a plurality of monitors for said plurality of processes, each of said plurality of processes having one corresponding monitor from said plurality of monitors resident on a second network node, said second network node connected to said first network node via a communications link;
periodically checking the operation of said plurality of processes, said periodic checking performed by said plurality of corresponding monitors;
if an execution failure of one process of said plurality of processes is detected by its corresponding monitor, then terminating execution of said failed process from executing on said first network node while the remaining processes continue to execute on said first network node;
transferring and initiating execution of said failed process as a new process on said second network node;
initiating execution of a new corresponding monitor for said new process on said first network node; and
terminating said corresponding monitor of said failed process from executing on said first network node.
-
-
47. A computer software product for a computer system comprising a first network node and a second network node to control failover so that the termination of all the processes executing on said first network node is not required, the computer program product comprising:
software instructions for enabling the computer system to perform predetermined operations, and a computer readable medium bearing the software instructions, said predetermined operations comprising;
executing a plurality of processes resident on the first network node;
executing a plurality of monitors for said plurality of processes, each of said plurality of processes having one corresponding monitor from said plurality of monitors resident on a second network node, said second network node connected to said first network node via a communications link;
periodically checking the operation of said plurality of processes, said periodic checking performed by said plurality of corresponding monitors;
if an execution failure of a process of said plurality of processes is detected its corresponding monitor, then terminating execution of said failed process from executing on said first network node while the remaining processes continue to execute on said first network node;
transferring and initiating execution of said failed process as a new process on said second network node;
initiating execution of a new corresponding monitor for said new process on said first network node; and
terminating said corresponding monitor of said failed process from executing on said first network node.
-
48. A system comprising:
-
a first network node and a second network node connected via a communication link;
at least one application comprising a plurality of sub-processes resident on said first network node and presently executing;
a plurality of monitors for said plurality of sub-processes, each of said plurality of sub-processes having one corresponding monitor from said plurality of monitors, said plurality of monitors resident on said second network node and presently executing, each corresponding monitor being configured for detecting failure of its corresponding sub-process of said application on said first network node and, if a corresponding sub-process has failed, causing said failed sub-process to execute on said second network node as a new sub-process having a corresponding monitor executing on said first network node while the remaining sub-processes continue to execute on said first network node. - View Dependent Claims (49, 50, 51, 52, 53, 54)
-
-
55. A system comprising:
-
a first plurality of network nodes connected via a first communication link;
a second plurality of network nodes connected via a second communication link;
said first communication link and said second communication link connected through a third communication link, at least one application comprising a plurality of sub-processes resident on one of the network nodes and presently executing;
a plurality of monitors for said plurality of sub-processes, each of said plurality of sub-processes having one corresponding monitor from said plurality of monitors, said plurality of monitors resident on one of the network nodes and presently executing, each corresponding monitor being configured for detecting failure of its corresponding sub-process and, if a corresponding sub-process has failed, causing said failed sub-process to execute on another of the network nodes as a new sub-process having a corresponding monitor while the remaining sub-processes continue to execute on said one network node where resident. - View Dependent Claims (56, 57, 58, 59, 60, 61, 62, 63, 64, 65)
-
-
66. A method for operating a failover system, wherein failover does not require the termination of all the sub-processes executing on a first network node, the method comprising:
-
executing at least one application comprising a plurality of sub-processes, said at least one application resident on the first network node;
executing a plurality of monitors for said plurality of sub-processes, each of said plurality of sub-processes having one corresponding monitor from said plurality of monitors, said plurality of monitors resident on a second network node, said second network node connected to said first network node via a communications link;
periodically checking the operation of said application, said periodic checking performed by said plurality of corresponding monitors;
if an execution failure of a sub-process of said application is detected by its corresponding monitor, then terminating execution of said failed sub-process from executing on said first network node while the remaining sub-processes continue to execute on said first network node;
transferring and initiating execution of said failed sub-process as a new sub-process on said second network node;
initiating execution of a new corresponding monitor for said new sub-process on said first network node; and
terminating said corresponding monitor of said failed sub-process from executing on said first network node. - View Dependent Claims (67, 68)
-
-
69. A computer system for controlling failover so that the termination of all the executing processes is not required, the computer system comprising:
-
a first network node and a second network node;
a memory comprising software instructions that enable the computer system to perform;
executing at least one application comprising a plurality of sub-processes, said at least one application resident on the first network node;
executing a plurality of monitors for said plurality of sub-processes, each of said plurality of sub-processes having one corresponding monitor from said plurality of monitors, said plurality of monitors resident on a second network node, said second network node connected to said first network node via a communications link;
periodically checking the operation of said application, said periodic checking performed by said plurality of corresponding monitors;
if an execution failure of a sub-process of said application is detected by its corresponding monitor, then terminating execution of said failed sub-process from executing on said first network node while the remaining sub-processes continue to execute on said first network node;
transferring and initiating execution of said failed sub-process as a new sub-process on said second network node;
initiating execution of a new corresponding monitor for said new sub-process on said first network node; and
terminating said corresponding monitor of said failed sub-process from executing on said first network node.
-
-
70. A computer software product for a computer system comprising a first network node and a second network node to control failover so that the termination of all the sub-processes executing on said first network node is not required, the computer program product comprising:
software instructions for enabling the computer system to perform predetermined operations, and a computer readable medium bearing the software instructions, said predetermined operations comprising;
executing at least one application comprising a plurality of sub-processes, said at least one application resident on the first network node;
executing a plurality of monitors for said plurality of sub-processes, each of said plurality of sub-processes having one corresponding monitor from said plurality of monitors, said plurality of monitors resident on a second network node, said second network node connected to said first network node via a communications link;
periodically checking the operation of said application, said periodic checking performed by said plurality of corresponding monitors;
if an execution failure of a sub-process of said application is detected by its corresponding monitor, then terminating execution of said failed sub-process from executing on said first network node while the remaining sub-processes continue to execute on said first network node;
transferring and initiating execution of said failed sub-process as a new sub-process on said second network node;
initiating execution of a new corresponding monitor for said new sub-process on said first network node; and
terminating said corresponding monitor of said failed sub-process from executing on said first network node.
Specification