Systems and/or methods for rapid recovery from write-ahead logs
First Claim
1. A recovery method for a computer system including a processor and a memory that has encountered a fault, the method comprising:
- loading to the memory actions taken by the computer system from a write-ahead log maintained on a non-transitory computer readable storage medium, the write-ahead log storing the actions in chronological order;
running the actions stored in the memory through at least one filter in order to identify irrelevant actions that do not need to be replayed in order to recover from the fault;
replaying, using the processor, the actions from the memory until the entire log is replayed in reverse-chronological order, except for identified irrelevant actions that do not need to be replayed, wherein the replaying comprises causing the computer system to take a subset of the same actions stored in the memory again; and
transitioning the computer system from a recovery state to a normal operation state, following the replaying.
3 Assignments
0 Petitions
Accused Products
Abstract
Certain example embodiments provide a single pass, reverse chronological approach to write-ahead log recovery, enabling space- and time-efficient the recovery of stored data from large write-ahead logs to a transient storage medium. The techniques described herein can in certain instances enable fast and efficient recovery, even in scenarios where at the time of a failure requiring such a recovery the live log is potentially multiple terabytes or larger in size. Certain example embodiments make use of a filtering mechanism (e.g., involving potentially stateful delete, skip, and/or transaction filters), a key/value property (allowing a live set of data, once identified, to be applied in any arbitrary order), etc. A simplified environment with a small closed set of mutative operations allows for the performing of recovery backwards by scanning the log from the most recent written record backwards in time (and, in other words, finishing with the oldest record).
-
Citations
20 Claims
-
1. A recovery method for a computer system including a processor and a memory that has encountered a fault, the method comprising:
-
loading to the memory actions taken by the computer system from a write-ahead log maintained on a non-transitory computer readable storage medium, the write-ahead log storing the actions in chronological order; running the actions stored in the memory through at least one filter in order to identify irrelevant actions that do not need to be replayed in order to recover from the fault; replaying, using the processor, the actions from the memory until the entire log is replayed in reverse-chronological order, except for identified irrelevant actions that do not need to be replayed, wherein the replaying comprises causing the computer system to take a subset of the same actions stored in the memory again; and transitioning the computer system from a recovery state to a normal operation state, following the replaying. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory computer readable-storage medium tangibly storing instructions that, when performed by a processor of a computer system that needs to be recovered as a result of a fault taking place, at least:
-
load actions taken by the computer system from a disk-backed log that stores the actions in chronological order, to memory of the computer system, wherein the actions loaded from the log are mutative actions that occurred within a time period of interest defined as being between a predetermined time before the fault and the fault; run the actions stored in the memory through at least one filter in order to identify irrelevant actions that do not need to be replayed in order to recover from the fault; replay, using the processor, the actions from the memory until the entire log for the time period of interest is replayed in reverse-chronological order, except for the identified irrelevant actions that do not need to be replayed, wherein the replay comprises causing the computer system to take a subset of the same actions stored in the memory again; and transition the computer system from a recovery state to a normal operation state, following the replay, wherein there is no data dependency between actions recorded in the log and the log is maintained such that older actions cannot invalidate newer actions.
-
-
13. A computer system operable in normal and recovery modes, comprising:
-
a processor and a memory; a non-transitory computer readable storage medium tangibly storing a log that stores actions of preselected types taken by the computer system in chronological order; recovery program logic configured to operate in connection with the processor when the computer system is in recovery mode to load actions from the log into the memory and filter out irrelevant actions that do not need to be replayed; an object manager configured to cooperate with the processor when the computer system is in recovery mode to restore objects in memory in reverse-chronological order by replaying the actions from the memory in reverse-chronological order, wherein the replaying comprises causing the computer system to take a subset of the same actions stored in the memory again, wherein the processor is further configured to (a) place the computer system in recovery mode when a fault is detected and (b) transition the computer system from recovery mode to normal mode once the object manager has finished replaying all of the actions that occurred within a time period of interest leading up to the fault, except for the filtered out irrelevant actions. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification