Method and system for providing cluster replicated checkpoint services

US 6,823,474 B2
Filed: 05/02/2001
Issued: 11/23/2004
Est. Priority Date: 05/02/2000
Status: Active Grant

First Claim

Patent Images

1. A method for providing cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, and the plurality of replicas comprising a primary replica and a secondary replica, the method comprising:

managing the checkpoint, the checkpoint containing checkpoint information;

creating the primary replica in a memory of the first node, the primary replica containing first checkpoint information;

updating the primary replica so that the first checkpoint information corresponds to the checkpoint information;

creating the secondary replica in a memory of the second node, the secondary replica containing second checkpoint information, wherein the primary replica and the secondary replica each have a state;

updating the secondary replica so that the second checkpoint information corresponds to the checkpoint information;

maintaining the state of the primary replica; and

maintaining the state of the secondary replica;

wherein the state of the primary replica and the state of the secondary replica are EMPTY, CHECKPOINTING, MISSED, COMPLETED, or CORRUPTED.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention describes a method and system for providing cluster replicated checkpoint services. In particular, the method provides cluster replicated checkpoint services for replicas of a checkpoint in a cluster. The cluster includes a first node and a second node, which are connected to one another via a network. The replicas include a primary replica and a secondary replica. The method includes managing the checkpoint that contains checkpoint information, and creating the primary replica in a memory of the first node. The primary replica contains first checkpoint information. The method also includes updating the primary replica so that the first checkpoint information corresponds to the checkpoint information, creating the secondary replica that contains second checkpoint information in a memory of the second node, and updating the secondary replica so that the second checkpoint information corresponds to the checkpoint information.

Citations

41 Claims

1. A method for providing cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, and the plurality of replicas comprising a primary replica and a secondary replica, the method comprising:
- managing the checkpoint, the checkpoint containing checkpoint information;
  
  creating the primary replica in a memory of the first node, the primary replica containing first checkpoint information;
  
  updating the primary replica so that the first checkpoint information corresponds to the checkpoint information;
  
  creating the secondary replica in a memory of the second node, the secondary replica containing second checkpoint information, wherein the primary replica and the secondary replica each have a state;
  
  updating the secondary replica so that the second checkpoint information corresponds to the checkpoint information;
  
  maintaining the state of the primary replica; and
  
  maintaining the state of the secondary replica;
  
  wherein the state of the primary replica and the state of the secondary replica are EMPTY, CHECKPOINTING, MISSED, COMPLETED, or CORRUPTED.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1, wherein the updating the secondary replica step uses a checkpoint message.
  - 3. The method of claim 2, further comprising:
    - formatting the checkpoint message based on version information.
  - 4. The method of claim 1, wherein the two updating steps are asynchronous.
  - 5. The method of claim 1, further comprising:
6. The method of claim 1, further comprising:
- executing an error recovery procedure if either the state of the primary replica or the state of the secondary replica is MISSED or CORRUPTED.
7. The method of claim 1, further comprising:
- synchronizing the first checkpoint information in the primary replica and the second checkpoint information in the secondary replica.
8. The method of claim 1, further comprising:
- retaining the primary replica in the memory of the first node until a retention time of the primary replica expires; and
  
  retaining the secondary replica in the memory of the second node until a retention time of the secondary replica expires.
9. The method of claim 1, further comprising:
- conducting a garbage collection based on a retention time of the primary replica and a retention time of the secondary replica.
10. The method of claim 1, wherein the checkpoint has a plurality of checkpoint attributes.
11. The method of claim 1, wherein there is a control block associated with the primary replica and there is a control block associated with the secondary replica.
12. The method of claim 11, further comprising:
- maintaining first control block information in the control block of the primary replica; and
  
  maintaining second control block information in the control block of the secondary replica.
13. The method of claim 12, further comprising:
- formatting a checkpoint message using first control block information, second control block information, or both, wherein the checkpoint message is used in the updating the secondary replica step.
14. The method of claim 1, further comprising:
- executing a failure recovery procedure.
15. The method of claim 14, wherein the executing step further comprises:
- when a primary component on the first node fails, restarting the primary component using the primary replica.
16. The method of claim 14, wherein the executing step further comprises:
- when a primary component on the first node fails, starting a secondary component on the second node as a new primary component using the secondary replica.

17. A method for providing cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, the plurality of replicas including a primary replica and a secondary replica, the method comprising:
- creating the checkpoint;
  
  opening the checkpoint from the first node in a write mode;
  
  creating the primary replica in a memory of the first node;
  
  updating the checkpoint;
  
  updating the primary replica;
  
  propagating a checkpoint message, the checkpoint message including information regarding the checkpoint;
  
  opening the checkpoint from the second node in a read mode;
  
  creating the secondary replica in a memory of the second node, wherein the primary replica has a state and the secondary replica has a state;
  
  updating the secondary replica based on the checkpoint message;
  
  executing an error recovery procedure if the state of the primary replica or the state of the secondary replica is invalid.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 18. The method of claim 17, wherein the propagating and the updating steps are asynchronous.
  - 19. The method of claim 17, further comprising:
20. The method of claim 19, wherein the executing step further comprises:
- making a secondary component in the second node a new primary component using the secondary replica.
21. The method of claim 19, wherein the executing step further comprises:
- restarting a primary component in the first node using the primary replica.
22. The method of claim 17, further comprising:
- formatting the checkpoint message using version information.
23. The method of claim 17, further comprising:
- deleting the primary replica based on a first retention time of the primary replica; and
  
  deleting the secondary replica based on a second retention time of the secondary replica.
24. The method of claim 17, further comprising:
- conducting a garbage collection using a first retention time of the primary replica and a second retention time of the secondary replica.
25. The method of claim 17, wherein the memory of the first node has a first control block for the primary replica and the memory of the second node has a second control block for the secondary replica.
26. The method of claim 25, further comprising:
- maintaining the first control block; and
  
  maintaining the second control block.
27. The method of claim 17, wherein the state of the primary replica and the state of the secondary replica is selected from the group consisting of EMPTY, CHECKPOINTING, MISSED, COMPLETED and CORRUPTED.
28. The method of claim 27, further comprising:
- executing an error recovery procedure if the state of the primary replica or the state of the secondary replica is MISSED or CORRUPTED.
29. The method of claim 17, wherein the checkpoint has checkpoint attributes.

30. A computer program product configured to provide cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, and the plurality of replicas comprising a primary replica and a secondary replica, the computer program product comprising:
- computer readable program code configured to manage the checkpoint, the checkpoint containing checkpoint information;
  
  computer readable program code configured to create the primary replica in a memory of the first node, the primary replica containing first checkpoint information;
  
  computer readable program code configured to update the primary replica so that the first checkpoint information corresponds to the checkpoint information;
  
  computer readable program code configured to create the secondary replica in a memory of the second node, the secondary replica containing second checkpoint information, wherein the primary replica and the secondary replica each have a state;
  
  computer readable program code configured to update the secondary replica so that the second checkpoint information corresponds to the checkpoint information;
  
  computer readable program code configured to maintain the state of the primary and the secondary replica, wherein the state of the primary replica and the state of the secondary replica are EMPTY, CHECKPOINTING, MISSED, COMPLETED, or CORRUPTED; and
  
  computer readable medium having the computer readable program codes embodied therein.

31. A computer program product configured to provide cluster replicated checkpoint services for a plurality of replicas for a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, and the plurality of replicas comprising a primary replica and a secondary replica, the computer program product comprising:
- computer readable program code configured to create the checkpoint;
  
  computer readable program code configured to open the checkpoint from the first node in a write mode;
  
  computer readable program code configured to create the primary replica in a memory of the first node;
  
  computer readable program code configured to update the checkpoint;
  
  computer readable program code configured to update the primary replica;
  
  computer readable program code configured to propagate a checkpoint message, the checkpoint message including information regarding the checkpoint;
  
  computer readable program code configured to open the checkpoint from the second node in a read mode;
  
  computer readable program code configured to create the secondary replica in a memory of the second node;
  
  computer readable program code configured to update the secondary replica based on the checkpoint message;
  
  computer readable program code configured to execute an error recovery procedure if the state of the primary replica or the state of the secondary replica is invalid;
  
  computer readable medium having the computer readable program codes embodied therein.

32. A system for providing cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, and the plurality of replicas comprising a primary replica and a secondary replica, the method comprising:
- means for managing the checkpoint, the checkpoint containing checkpoint information;
  
  means for creating the primary replica in a memory of the first node, the primary replica containing first checkpoint information;
  
  means for updating the primary replica so that the first checkpoint information corresponds to the checkpoint information;
  
  means for creating the secondary replica in a memory of the second node, the secondary replica containing second checkpoint information, wherein the primary replica and the secondary replica each have a state;
  
  means for updating the secondary replica so that the second checkpoint information corresponds to the checkpoint information;
  
  means for maintaining the state of the primary replica;
  
  means for maintaining the state of the secondary replica; and
  
  means for executing an error recovery procedure if either the state of the primary replica or the state of the secondary replica is invalid.
- View Dependent Claims (33, 34, 35, 36, 37)
- - 33. The system of claim 32, wherein the means for updating the secondary replica uses a checkpoint message.
  - 34. The system of claim 33, further comprising:
    - means for formatting the checkpoint message based on version information.
  - 35. The system of claim 32, further comprising:
36. The system of claim 32, further comprising:
- means for maintaining a control block of the primary replica; and
  
  means for maintaining a control block of the secondary replica.
37. The system of claim 32, further comprising:
- means for conducting a garbage collection based on a retention time of the primary replica and a retention time of the secondary replica.

38. A system for providing cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, the plurality of replicas including a first replica and a second replica, the system comprising:
- means for creating the checkpoint;
  
  means for opening the checkpoint from the first node in a write mode;
  
  means for creating the primary replica in a memory of the first node;
  
  means for updating the checkpoint;
  
  means for updating the primary replica;
  
  means for propagating a checkpoint message, the checkpoint message including information regarding the checkpoint;
  
  means for opening the checkpoint from the second node in a read mode;
  
  means for creating the secondary replica in a memory of the second node, wherein the primary replica and secondary each have a state;
  
  means for updating the secondary replica based on the checkpoint message; and
  
  means for maintaining the state of the primary and secondary replicas;
  
  wherein the state of the primary replica and state of the secondary replica are selected from the group of states consisting of EMPTY, CHECKPOINTING, MISSED, COMPLETED, and CORRUPTED.
- View Dependent Claims (39)
- - 39. The system of claim 38, wherein the propagating means and the updating means operate asynchronously.

40. A system for managing a checkpoint, the system comprising;
- a first node running a primary component and having an opened checkpoint, the first node further including a primary replica having first checkpoint information based on the opened checkpoint in its memory, having a first checkpoint service, and connected to a network; and
  
  a second node running a secondary component, including a secondary replica in its memory, having a second checkpoint service, and connected to the network, wherein the first checkpoint service and the second checkpoint service are capable of accessing the opened checkpoint, wherein the first checkpoint service works with the primary component to update the opened checkpoint, issue a checkpoint message containing information regarding the opened checkpoint, asynchronously propagate the checkpoint message, and update the first replica, and wherein the second checkpoint service is capable of asynchronously updating the secondary replica based on the checkpoint message.

41. A method for providing cluster replicated checkpoint services for a plurality of replicas of a checkpoint in a cluster, the cluster comprising a first node and a second node, which are connected to one another via a network, and the plurality of replicas comprising a primary replica and a secondary replica, the method comprising:
- managing the checkpoint, the checkpoint containing checkpoint information;
  
  creating the primary replica in a memory of the first node, the primary replica containing first checkpoint information;
  
  updating the primary replica so that the first checkpoint information corresponds to the checkpoint information;
  
  creating the secondary replica in a memory of the second node, the secondary replica containing second checkpoint information;
  
  updating the secondary replica so that the second checkpoint information corresponds to the checkpoint information;
  
  maintaining the state of the primary replica;
  
  maintaining the state of the secondary replica; and
  
  executing an error recovery procedure if either the state of the primary replica or the state of the secondary replica is invalid.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle America, Inc. (Oracle Corporation)
Original Assignee
Sun Microsystems Incorporated (Oracle Corporation)
Inventors
Brossier, Stephane, Herrmann, Frederic, Kampe, Mark A.
Primary Examiner(s)
Beausoliel, Robert
Assistant Examiner(s)
Duncan, Marc M

Application Number

US09/846,665
Publication Number

US 20020032883A1
Time in Patent Office

1,301 Days
Field of Search

714/20, 714/13, 714/15, 714/16, 714/6
US Class Current

714/13
CPC Class Codes

G06F 11/1464 for networked environments

G06F 11/203 using migration

Method and system for providing cluster replicated checkpoint services

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

41 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for providing cluster replicated checkpoint services

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

41 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links