Managing checkpoint queues in a multiple node system

US 7,065,540 B2
Filed: 03/04/2002
Issued: 06/20/2006
Est. Priority Date: 11/24/1998
Status: Expired due to Term

First Claim

Patent Images

1. A computer executable method of managing information about where to begin recovery after a failure of one or more nodes of a multiple-node system, the method comprising the steps of:

in a particular node of the multiple-node system, maintaining botha single-failure queue indicating where within a recovery log to begin recovery after a failure of said node; and

a multiple-failure queue indicating where within said recovery log to begin recovery after a failure of said node and one or more other nodes in said multiple-node system;

in response to a dirty data item being written to persistent storage, removing an entry for said data item from both said single-failure queue and said multiple-failure queue; and

in response to a dirty data item being sent to another node of said multiple-node system without first being written to persistent storage, removing an entry for said data item from said single-failure queue without removing the entry for said data item from said multiple-failure queue.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are provided for managing caches in a system with multiple caches that may contain different copies of the same data item. Specifically, techniques are provided for coordinating the write-to-disk operations performed on such data items to ensure that older versions of the data item are not written over newer versions, and to reduce the amount of processing required to recover after a failure. Various approaches are provided in which a master is used to coordinate with the multiple caches to cause a data item to be written to persistent storage. Techniques are also provided for managing checkpoints associated with the caches, where the checkpoints are used to determine the position at which to begin processing recovery logs in the event of a failure.

Citations

26 Claims

1. A computer executable method of managing information about where to begin recovery after a failure of one or more nodes of a multiple-node system, the method comprising the steps of:
- in a particular node of the multiple-node system, maintaining botha single-failure queue indicating where within a recovery log to begin recovery after a failure of said node; and
  
  a multiple-failure queue indicating where within said recovery log to begin recovery after a failure of said node and one or more other nodes in said multiple-node system;
  
  in response to a dirty data item being written to persistent storage, removing an entry for said data item from both said single-failure queue and said multiple-failure queue; and
  
  in response to a dirty data item being sent to another node of said multiple-node system without first being written to persistent storage, removing an entry for said data item from said single-failure queue without removing the entry for said data item from said multiple-failure queue.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1 further comprising the step of sending the dirty data item to another node to allow removal of the entry from said single-failure queue without the other node requesting the dirty data item.
  - 3. The method of claim 1 further comprising the steps of:
    - after a single node failure, applying said recovery log beginning at a position in said recovery log associated with the single-failure queue; and
      
      after a multiple node failure, applying said recovery log beginning at a position in said recovery log associated with the multiple-failure queue.
  - 4. The method of claim 1 wherein:
    - said single-failure queue and said multiple-failure queue are implemented by a single combined queue; and
      
      the step of removing an entry for said data item from said single-failure queue without removing the entry for said data item from said multiple-failure queue includes marking an entry for said data item in said combined queue without removing the entry for said data item from said combined queue.
  - 5. The method of claim 1 wherein said single-failure queue and said multiple-failure queue are implemented as two separate queues.

6. A computer executable method for recovering after a failure of one or more nodes of a multiple-node system, the method comprising the steps of:
- determining whether the failure involves only one node; and
  
  if the failure involves only said one node, then performing recovery by applying a recovery log of said node beginning at a first point in the recovery log; and
  
  if the failure involves one or more nodes in addition to said one node, then performing recovery by applying said recovery log of said node beginning at a second point in the recovery log;
  
  wherein said first point is different from said second point.
- View Dependent Claims (7)
- - 7. The method of claim 6 wherein:
    - the first point is determined, at least in part, by which data items that were dirtied by said node reside in caches in other nodes; and
      
      the second point is determined, at least in part, by which data items that were dirtied by said node have been persistently stored.

8. A computer executable method for recovering after a failure of one or more nodes of a multiple-node system, the method comprising the steps of:
- if it is unclear whether a particular version of a data item has been written to disk, then performing the steps of without attempting to recover said data item, marking dirtied cached versions of said data item that would have been covered if said particular version was written to disk;
  
  when a request is made to write one of said dirtied cached versions to disk, determining which version of said data item is already on disk; and
  
  if said particular version of said data item is already on disk, then not writing said one of said dirtied cached versions to disk.
- View Dependent Claims (9, 10)
- - 9. The method of claim 8 further comprising the step of, if said particular version of said data item is not already on disk, then recovering said data item.
  - 10. The method of claim 8 further comprising the step of, if said particular version of said data item is already on disk, then informing nodes that contain said dirtied cached versions of the data item that said dirtied cached versions are covered by a write-to-disk operation.

11. A computer executable method for recovering a current version of a data item after a failure of one or more nodes in a system that includes multiple caches, the method comprising the steps of:
- modifying the data item in a first node of said multiple caches to create a modified data item;
  
  sending the modified data item from said first node to a second node of said multiple caches without durably storing the modified data item from said first node to persistent storage;
  
  after said modified data item has been sent from said first node to said second node and before said data item in said first node has been covered by a write-to-disk operation, discarding said data item in said first node;
  
  after said failure, reconstructing the current version of said data item by applying changes to the data item on persistent storage based on merged redo logs associated with all of said multiple caches;
  
  maintaining, for each of said multiple caches, a globally-dirty checkpoint queue and a locally-dirty checkpoint queue;
  
  wherein the globally-dirty data items associated with entries in the globally-dirty checkpoint queue are not retained until covered by write-to-disk operations;
  
  determining, for each cache, a checkpoint based on a lower of a first-dirtied time of the entry at the head of the locally-dirty checkpoint queue and the first-dirtied time of the entry at the head of the globally-dirty checkpoint queue; and
  
  after said failure, determining where to begin processing the redo log associated with each cache based on the checkpoint determined for said cache.

12. A computer executable method for recovering a current version of a data item after a failure of one or more nodes in a system that includes multiple caches, the method comprising the steps of:
- modifying the data item in a first node of said multiple caches to create a modified data item;
  
  sending the modified data item from said first node to a second node of said multiple caches without durably storing the modified data item from said first node to persistent storage;
  
  after said modified data item has been sent from said first node to said second node and before said data item in said first node has been covered by a write-to-disk operation, discarding said data item in said first node; and
  
  after said failure, reconstructing the current version of said data item by applying changes to the data item on persistent storage based on merged redo logs associated with all of said multiple caches;
  
  maintaining, for each of said multiple caches, a globally-dirty checkpoint queue and a locally-dirty checkpoint queue;
  
  wherein the globally-dirty data items associated with entries in the globally-dirty checkpoint queue are not retained until covered by write-to-disk operations;
  
  maintaining, for each cache, a first checkpoint record for the locally-dirty checkpoint queue that indicates a first time, where all changes made to data items that are presently dirty in the cache prior to the first time have been recorded on a version of the data item that is on persistent storage;
  
  maintaining, for each cache, a second checkpoint record for the globally-dirty checkpoint queue, wherein the second checkpoint record includes a list of data items that were once dirtied in the cache but have since been transferred out and not written to persistent storage; and
  
  after said failure, determining where to begin processing the redo log associated with each cache based on the first checkpoint record and said second checkpoint record for said cache.
- View Dependent Claims (13)
- - 13. The method of claim 12 wherein the step of processing the redo log comprises the steps of:
    - determining a starting position for scanning the redo log based on a lesser of a position in the redo log as determined by the first checkpoint record and the positions in the log as determined by the earliest change made to the list of the data items in the second checkpoint record; and
      
      during recovery, for the portion of the redo log between the position indicated by the global checkpoint record to the position indicated by the local checkpoint record, considering for potential redo only those log records that correspond to the data items identified in the global checkpoint record.

14. A computer-readable medium carrying instructions for managing information about where to begin recovery after a failure of one or more nodes of a multiple-node system, the instructions comprising instructions for performing the steps of:
- in a particular node of a multiple-node system, maintaining botha single-failure queue that indicates where within a recovery log to begin recovery after a failure of said node, anda multiple-failure queue that indicates where within said recovery log to begin recovery after a failure of said node and one or more other nodes in said multiple-node system;
  
  in response to a dirty data item being written to persistent storage, removing an entry for said data item from both said single-failure queue and said multiple-failure queue; and
  
  in response to a dirty data item being sent to another node of said multiple-node system without first being written to persistent storage, removing an entry for said data item from said single-failure queue without removing the entry for said data item from said multiple-failure queue.
- View Dependent Claims (15, 16, 17, 18)
- - 15. The computer-readable medium of claim 14 further comprising instructions for performing the step of sending the dirty data item to another node to allow removal of the entry from said single-failure queue without the other node requesting the dirty data item.
  - 16. The computer-readable medium of claim 14 further comprising instructions for performing the steps of:
    - after a single node failure, applying said recoverylog beginning at a position in said recovery log associated with the single-failure queue; and
      
      after a multiple node failure, applying said recovery log beginning at a position in said recovery log associated with the multiple-failure queue.
  - 17. The computer-readable medium of claim 14 wherein:
    - said single-failure queue and said multiple-failure queue are implemented by a single combined queue; and
      
      the step of removing an entry for said data item from said single-failure queue without removing the entry for said data item from said multiple-failure queue includes marking an entry for said data item in said combined queue without removing the entry for said data item from said combined queue.
  - 18. The computer-readable medium of claim 14 wherein said single-failure queue and said multiple-failure queue are implemented as two separate queues.

19. A computer-readable medium carrying instructions for recovering after a failure of one or more nodes of a multiple-node system, the instructions comprising instructions for performing the steps of:
- determining whether the failure involves only one node; and
  
  if the failure involves only said one node, then performing recovery by applying a recovery log of said node beginning at a first point in the recovery log; and
  
  if the failure involves one or more nodes in addition to said one node, then performing recovery by applying said recovery log of said node beginning at a second point in the recovery log;
  
  wherein said first point is different from said second point.
- View Dependent Claims (20)
- - 20. The computer-readable medium of claim 19 wherein:
    - the first point is determined, at least in part, by which data items that were dirtied by said node reside in caches in other nodes; and
      
      the second point is determined, at least in part, by which data items that were dirtied by said node have been persistently stored.

21. A computer-readable medium carrying instructions for recovering after a failure of one or more nodes of a multiple-node system, the instructions comprising instructions for performing the steps of:
- if it is unclear whether a particular version of a data item has been written to disk, then performing the steps of without attempting to recover said data item, marking dirtied cached versions of said data item that would have been covered if said particular version was written to disk;
  
  when a request is made to write one of said dirtied cached versions to disk, determining which version of said data item is already on disk; and
  
  if said particular version of said data item is already on disk, then not writing said one of said dirtied cached versions to disk.
- View Dependent Claims (22, 23)
- - 22. The computer-readable medium of claim 21 further comprising instructions for performing the step of, if said particular version of said data item is not already on disk, then recovering said data item.
  - 23. The computer-readable medium of claim 21 further comprising instructions for performing the step of, if said particular version of said data item is already on disk, then informing nodes containing said dirtied cached versions of the data item that said dirtied cached versions are covered by a write-to-disk operation.

24. A computer-readable medium carrying instructions for recovering a current version of a data item after a failure of one or more nodes in a system that includes multiple caches, the instructions comprising instructions for performing the steps of:
- modifying the data item in a first node of said multiple caches to create a modified data item;
  
  sending the modified data item from said first node to a second node of said multiple caches without durably storing the modified data item from said first node to persistent storage;
  
  after said modified data item has been sent from said first node to said second node and before said data item in said first node has been covered by a write-to-disk operation, discarding said data item in said first node;
  
  after said failure, reconstructing the current version of said data item by applying changes to the data item on persistent storage based on merged redo logs associated with all of said multiple caches;
  
  maintaining, for each of said multiple caches, a globally-dirty checkpoint queue and a locally-dirty checkpoint queue;
  
  wherein the globally-dirty data items associated with entries in the globally-dirty checkpoint queue are not retained until covered by write-to-disk operations;
  
  determining, for each cache, a checkpoint based on a lower of a first-dirtied time of the entry at the head of the locally-dirty checkpoint queue and the first-dirtied time of the entry at the head of the globally-dirty checkpoint queue; and
  
  after said failure, determining where to begin processing the redo log associated with each cache based on the checkpoint determined for said cache.

25. A computer-readable medium carrying instructions for recovering a current version of a data item after a failure of one or more nodes in a system that includes multiple caches, the instructions comprising instructions for performing the steps of:
- modifying the data item in a first node of said multiple caches to create a modified data item;
  
  sending the modified data item from said first node to a second node of said multiple caches without durably storing the modified data item from said first node to persistent storage;
  
  after said modified data item has been sent from said first node to said second node and before said data item in said first node has been covered by a write-to-disk operation, discarding said data item in said first node;
  
  after said failure, reconstructing the current version of said data item by applying changes to the data item on persistent storage based on merged redo logs associated with all of said multiple caches;
  
  maintaining, for each of said multiple caches, a globally-dirty checkpoint queue and a locally-dirty checkpoint queue;
  
  wherein the globally-dirty data items associated with entries in the globally-dirty checkpoint queue are not retained until covered by write-to-disk operations;
  
  maintaining, for each cache, a first checkpoint recbrd for the locally-dirty checkpoint queue indicating a first time, where all changes made to data items that are presently dirty in the cache prior to the first time have been recorded on a version of the data item that is on persistent storage;
  
  maintaining, for each cache, a second checkpoint record for the globally-dirty checkpoint queue, wherein the second checkpoint record includes a list of data items that were once dirtied in the cache but have since been transferred out and not written to persistent storage; and
  
  after said failure, determining where to begin processing the redo log associated with each cache based on the first checkpoint record and said second checkpoint record for said cache.
- View Dependent Claims (26)
- - 26. The computer-readable medium of claim 25 wherein the step of processing the redo log comprises the steps of:
    - determining a starting position for scanning the redo log based on a lesser of a position in the redo log as determined by the first checkpoint record and the positions in the log as determined by the earliest change made to the list of the data items in the second checkpoint record; and
      
      during recovery, for the portion of the redo log between the position indicated by the global checkpoint record to the position indicated by the local checkpoint record, considering for potential redo only those log records that correspond to the data items-identified in the global checkpoint record.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
Oracle International Corporation (Oracle Corporation)
Inventors
Chan, Wilson Wai Shun, Srihari, Vinay, Brower, David, Bridge, William H., Bamford, Roger J., Chandrasekaran, Sashikanth, MacNaughton, Neil
Primary Examiner(s)
Corrielus, Jean M.

Application Number

US10/092,047
Publication Number

US 20020099729A1
Time in Patent Office

1,569 Days
Field of Search

707/100, 707/202, 707/8, 707/201, 707/200, 709/201, 717/108, 717/124
US Class Current

707/674
CPC Class Codes

G06F 11/1469   Backup restoration techniques

G06F 11/1471   involving logging of persis...

G06F 12/0804   with main memory updating G...

G06F 12/0815   Cache consistency protocols

G06F 16/27   Replication, distribution o...

Y10S 707/99952   Coherency, e.g. same view t...

Y10S 707/99953   Recoverability

Y10S 707/99954   Version management

Y10S 707/99955   Archiving or backup

Managing checkpoint queues in a multiple node system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Managing checkpoint queues in a multiple node system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links