Method and system for providing cluster replicated checkpoint services
First Claim
1. A method for providing cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, and the plurality of replicas comprising a primary replica and a secondary replica, the method comprising:
- managing the checkpoint, the checkpoint containing checkpoint information;
creating the primary replica in a memory of the first node, the primary replica containing first checkpoint information;
updating the primary replica so that the first checkpoint information corresponds to the checkpoint information;
creating the secondary replica in a memory of the second node, the secondary replica containing second checkpoint information, wherein the primary replica and the secondary replica each have a state;
updating the secondary replica so that the second checkpoint information corresponds to the checkpoint information;
maintaining the state of the primary replica; and
maintaining the state of the secondary replica;
wherein the state of the primary replica and the state of the secondary replica are EMPTY, CHECKPOINTING, MISSED, COMPLETED, or CORRUPTED.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention describes a method and system for providing cluster replicated checkpoint services. In particular, the method provides cluster replicated checkpoint services for replicas of a checkpoint in a cluster. The cluster includes a first node and a second node, which are connected to one another via a network. The replicas include a primary replica and a secondary replica. The method includes managing the checkpoint that contains checkpoint information, and creating the primary replica in a memory of the first node. The primary replica contains first checkpoint information. The method also includes updating the primary replica so that the first checkpoint information corresponds to the checkpoint information, creating the secondary replica that contains second checkpoint information in a memory of the second node, and updating the secondary replica so that the second checkpoint information corresponds to the checkpoint information.
-
Citations
41 Claims
-
1. A method for providing cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, and the plurality of replicas comprising a primary replica and a secondary replica, the method comprising:
-
managing the checkpoint, the checkpoint containing checkpoint information;
creating the primary replica in a memory of the first node, the primary replica containing first checkpoint information;
updating the primary replica so that the first checkpoint information corresponds to the checkpoint information;
creating the secondary replica in a memory of the second node, the secondary replica containing second checkpoint information, wherein the primary replica and the secondary replica each have a state;
updating the secondary replica so that the second checkpoint information corresponds to the checkpoint information;
maintaining the state of the primary replica; and
maintaining the state of the secondary replica;
wherein the state of the primary replica and the state of the secondary replica are EMPTY, CHECKPOINTING, MISSED, COMPLETED, or CORRUPTED. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
executing an error recovery procedure if either the state of the primary replica or the state of the secondary replica is invalid.
-
-
6. The method of claim 1, further comprising:
executing an error recovery procedure if either the state of the primary replica or the state of the secondary replica is MISSED or CORRUPTED.
-
7. The method of claim 1, further comprising:
synchronizing the first checkpoint information in the primary replica and the second checkpoint information in the secondary replica.
-
8. The method of claim 1, further comprising:
-
retaining the primary replica in the memory of the first node until a retention time of the primary replica expires; and
retaining the secondary replica in the memory of the second node until a retention time of the secondary replica expires.
-
-
9. The method of claim 1, further comprising:
conducting a garbage collection based on a retention time of the primary replica and a retention time of the secondary replica.
-
10. The method of claim 1, wherein the checkpoint has a plurality of checkpoint attributes.
-
11. The method of claim 1, wherein there is a control block associated with the primary replica and there is a control block associated with the secondary replica.
-
12. The method of claim 11, further comprising:
-
maintaining first control block information in the control block of the primary replica; and
maintaining second control block information in the control block of the secondary replica.
-
-
13. The method of claim 12, further comprising:
-
formatting a checkpoint message using first control block information, second control block information, or both, wherein the checkpoint message is used in the updating the secondary replica step.
-
-
14. The method of claim 1, further comprising:
executing a failure recovery procedure.
-
15. The method of claim 14, wherein the executing step further comprises:
when a primary component on the first node fails, restarting the primary component using the primary replica.
-
16. The method of claim 14, wherein the executing step further comprises:
when a primary component on the first node fails, starting a secondary component on the second node as a new primary component using the secondary replica.
-
17. A method for providing cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, the plurality of replicas including a primary replica and a secondary replica, the method comprising:
-
creating the checkpoint;
opening the checkpoint from the first node in a write mode;
creating the primary replica in a memory of the first node;
updating the checkpoint;
updating the primary replica;
propagating a checkpoint message, the checkpoint message including information regarding the checkpoint;
opening the checkpoint from the second node in a read mode;
creating the secondary replica in a memory of the second node, wherein the primary replica has a state and the secondary replica has a state;
updating the secondary replica based on the checkpoint message;
executing an error recovery procedure if the state of the primary replica or the state of the secondary replica is invalid. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
executing a failure recovery procedure.
-
-
20. The method of claim 19, wherein the executing step further comprises:
making a secondary component in the second node a new primary component using the secondary replica.
-
21. The method of claim 19, wherein the executing step further comprises:
restarting a primary component in the first node using the primary replica.
-
22. The method of claim 17, further comprising:
formatting the checkpoint message using version information.
-
23. The method of claim 17, further comprising:
-
deleting the primary replica based on a first retention time of the primary replica; and
deleting the secondary replica based on a second retention time of the secondary replica.
-
-
24. The method of claim 17, further comprising:
conducting a garbage collection using a first retention time of the primary replica and a second retention time of the secondary replica.
-
25. The method of claim 17, wherein the memory of the first node has a first control block for the primary replica and the memory of the second node has a second control block for the secondary replica.
-
26. The method of claim 25, further comprising:
-
maintaining the first control block; and
maintaining the second control block.
-
-
27. The method of claim 17, wherein the state of the primary replica and the state of the secondary replica is selected from the group consisting of EMPTY, CHECKPOINTING, MISSED, COMPLETED and CORRUPTED.
-
28. The method of claim 27, further comprising:
executing an error recovery procedure if the state of the primary replica or the state of the secondary replica is MISSED or CORRUPTED.
-
29. The method of claim 17, wherein the checkpoint has checkpoint attributes.
-
30. A computer program product configured to provide cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, and the plurality of replicas comprising a primary replica and a secondary replica, the computer program product comprising:
-
computer readable program code configured to manage the checkpoint, the checkpoint containing checkpoint information;
computer readable program code configured to create the primary replica in a memory of the first node, the primary replica containing first checkpoint information;
computer readable program code configured to update the primary replica so that the first checkpoint information corresponds to the checkpoint information;
computer readable program code configured to create the secondary replica in a memory of the second node, the secondary replica containing second checkpoint information, wherein the primary replica and the secondary replica each have a state;
computer readable program code configured to update the secondary replica so that the second checkpoint information corresponds to the checkpoint information;
computer readable program code configured to maintain the state of the primary and the secondary replica, wherein the state of the primary replica and the state of the secondary replica are EMPTY, CHECKPOINTING, MISSED, COMPLETED, or CORRUPTED; and
computer readable medium having the computer readable program codes embodied therein.
-
-
31. A computer program product configured to provide cluster replicated checkpoint services for a plurality of replicas for a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, and the plurality of replicas comprising a primary replica and a secondary replica, the computer program product comprising:
-
computer readable program code configured to create the checkpoint;
computer readable program code configured to open the checkpoint from the first node in a write mode;
computer readable program code configured to create the primary replica in a memory of the first node;
computer readable program code configured to update the checkpoint;
computer readable program code configured to update the primary replica;
computer readable program code configured to propagate a checkpoint message, the checkpoint message including information regarding the checkpoint;
computer readable program code configured to open the checkpoint from the second node in a read mode;
computer readable program code configured to create the secondary replica in a memory of the second node;
computer readable program code configured to update the secondary replica based on the checkpoint message;
computer readable program code configured to execute an error recovery procedure if the state of the primary replica or the state of the secondary replica is invalid;
computer readable medium having the computer readable program codes embodied therein.
-
-
32. A system for providing cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, and the plurality of replicas comprising a primary replica and a secondary replica, the method comprising:
-
means for managing the checkpoint, the checkpoint containing checkpoint information;
means for creating the primary replica in a memory of the first node, the primary replica containing first checkpoint information;
means for updating the primary replica so that the first checkpoint information corresponds to the checkpoint information;
means for creating the secondary replica in a memory of the second node, the secondary replica containing second checkpoint information, wherein the primary replica and the secondary replica each have a state;
means for updating the secondary replica so that the second checkpoint information corresponds to the checkpoint information;
means for maintaining the state of the primary replica;
means for maintaining the state of the secondary replica; and
means for executing an error recovery procedure if either the state of the primary replica or the state of the secondary replica is invalid. - View Dependent Claims (33, 34, 35, 36, 37)
means for executing a failure recovery procedure.
-
-
36. The system of claim 32, further comprising:
-
means for maintaining a control block of the primary replica; and
means for maintaining a control block of the secondary replica.
-
-
37. The system of claim 32, further comprising:
means for conducting a garbage collection based on a retention time of the primary replica and a retention time of the secondary replica.
-
38. A system for providing cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, the plurality of replicas including a first replica and a second replica, the system comprising:
-
means for creating the checkpoint;
means for opening the checkpoint from the first node in a write mode;
means for creating the primary replica in a memory of the first node;
means for updating the checkpoint;
means for updating the primary replica;
means for propagating a checkpoint message, the checkpoint message including information regarding the checkpoint;
means for opening the checkpoint from the second node in a read mode;
means for creating the secondary replica in a memory of the second node, wherein the primary replica and secondary each have a state;
means for updating the secondary replica based on the checkpoint message; and
means for maintaining the state of the primary and secondary replicas;
wherein the state of the primary replica and state of the secondary replica are selected from the group of states consisting of EMPTY, CHECKPOINTING, MISSED, COMPLETED, and CORRUPTED. - View Dependent Claims (39)
-
-
40. A system for managing a checkpoint, the system comprising;
-
a first node running a primary component and having an opened checkpoint, the first node further including a primary replica having first checkpoint information based on the opened checkpoint in its memory, having a first checkpoint service, and connected to a network; and
a second node running a secondary component, including a secondary replica in its memory, having a second checkpoint service, and connected to the network, wherein the first checkpoint service and the second checkpoint service are capable of accessing the opened checkpoint, wherein the first checkpoint service works with the primary component to update the opened checkpoint, issue a checkpoint message containing information regarding the opened checkpoint, asynchronously propagate the checkpoint message, and update the first replica, and wherein the second checkpoint service is capable of asynchronously updating the secondary replica based on the checkpoint message.
-
-
41. A method for providing cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, and the plurality of replicas comprising a primary replica and a secondary replica, the method comprising:
-
managing the checkpoint, the checkpoint containing checkpoint information;
creating the primary replica in a memory of the first node, the primary replica containing first checkpoint information;
updating the primary replica so that the first checkpoint information corresponds to the checkpoint information;
creating the secondary replica in a memory of the second node, the secondary replica containing second checkpoint information;
updating the secondary replica so that the second checkpoint information corresponds to the checkpoint information;
maintaining the state of the primary replica;
maintaining the state of the secondary replica; and
executing an error recovery procedure if either the state of the primary replica or the state of the secondary replica is invalid.
-
Specification