Recovering from storage transaction failures using checkpoints
First Claim
1. A method of checkpointing a copy-on-write operation sequence, the method comprising:
- (a) receiving a first write request identifying payload data to be written beginning at a first address of a first data store;
(b) reading original data associated with the first address of the first data store;
(c) copying the original data to a second data store beginning at a second address;
(d) recording transactional information associated with the first write request, the transactional information including indicia associated with the second address;
(e) generating a first checkpoint confirming the successful recordation of the transactional information and the successful copying of the original data to the second data store;
(f) writing the payload data to the first data store beginning at the first address;
(g) in response to successfully writing the payload data to the first data store, generating a second checkpoint confirming the successful completion of the copy-on-write operation sequence; and
(h) removing information associated with a second write request from at least one queue of a processor module based on finding the second checkpoint in the transactional information.
8 Assignments
0 Petitions
Accused Products
Abstract
The disclosed technology facilitates recovery from storage-related failures by checkpointing copy-on-write operation sequences. An operation sequence incorporating such checkpoints into a copy-on-write can include the following: receive a write request that identifies payload data to be written to a first data store, read original data associated with the first data store, copy the original data to a second data store, record transactional information associated with the write request, generate a first checkpoint to confirm the successful recordation of the transactional information and the successful copying of the original data to the second data store, write the payload data to the first data store, acknowledge a successful completion of the copy-on-write operation sequence, and generate a second checkpoint that confirms the successful completion of such operation sequence. The first and second checkpoints are used to form a pre-failure representation of one or more storage units (or parts thereof). The checkpoints can be stored with other transactional information, to facilitate recovery in the event of a failure, and can be used to facilitate the use of optimizations to process I/O operations.
-
Citations
18 Claims
-
1. A method of checkpointing a copy-on-write operation sequence, the method comprising:
-
(a) receiving a first write request identifying payload data to be written beginning at a first address of a first data store; (b) reading original data associated with the first address of the first data store; (c) copying the original data to a second data store beginning at a second address; (d) recording transactional information associated with the first write request, the transactional information including indicia associated with the second address; (e) generating a first checkpoint confirming the successful recordation of the transactional information and the successful copying of the original data to the second data store; (f) writing the payload data to the first data store beginning at the first address; (g) in response to successfully writing the payload data to the first data store, generating a second checkpoint confirming the successful completion of the copy-on-write operation sequence; and (h) removing information associated with a second write request from at least one queue of a processor module based on finding the second checkpoint in the transactional information. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of recovering from a failure associated with a copy-on-write operation sequence, the method comprising:
-
identifying at least one write request queued prior to a failure, the write request corresponding to a plurality of operations in a copy-on-write operation sequence; determining whether a first checkpoint was formed in response to completing a first portion of the copy-on-write operation sequence; determining whether a second checkpoint was formed in response to completing at least a second portion of the copy-on-write operation sequence; and based on at least one of the first and second checkpoints, processing the queued write request to at least partially recover from the failure, the write-request processing including; upon failing to locate the first checkpoint, queuing the plurality of operations in the copy-on-write operation sequence for execution, upon locating the first checkpoint and failing to locate the second checkpoint, queuing a subset of the plurality of operations for execution, the queued subset of operations including operations associated with the second portion, but not the first portion, of the copy-on-write operation sequence, and upon locating the first and second checkpoints, removing the plurality of operations in the copy-on-write operation sequence from at least one operation queue. - View Dependent Claims (8, 9, 10)
-
-
11. A method of recovering from a storage transaction failure, wherein the storage transaction failure corresponds to at least one of a hardware failure and a power failure of a primary processor module, the method comprising:
-
receiving a write request identifying payload data to be written beginning at a first address of a first data store; copying original data associated with the first address of the first data store to a second data store beginning at a second address; recording transactional information associated with the write request, the transactional information including indicia associated with the first and second addresses and a time that the write request was received; generating a first indicator confirming the recordation of the transactional information; continuing, upon generation of the first indicator, writing the payload data to the first data store beginning at the first address; and in response to a storage transaction failure that prevents the successful writing of the payload data to the first data store, using the first indicator and at least some of the transactional information to, at least partially, finish writing the payload data to the first data store; generating a second indicator confirming that the payload data was written to the first data store; and removing information associated with a second write request from at least one queue of a standby processor module based on finding the second indicator in the transactional information, wherein the standby processor module assumes the tasks of the primary processor module in response to the storage transaction failure. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A method of recovering from a storage transaction failure, the method comprising:
-
detecting a storage transaction failure occurring at time T1; identifying at least one indicator associated with a first write request that was received prior to time T1, the at least one indicator confirming that a portion of a copy-on-write sequence associated with the first write request has been completed, the portion being indicated by the at least one indicator; forming a representation of at least one storage unit as it existed prior to the storage transaction failure based at least in part on the at least one indicator and data recorded in the partially completed copy-on-write sequence; and loading information associated with the first write request into at least one queue of a standby processor module based on the at least one indicator. - View Dependent Claims (17)
-
-
18. A method of recovering from a storage transaction failure, the method comprising:
-
detecting a storage transaction failure occurring at time T1; identifying at least one indicator associated with a first write request that was received prior to time T1, the at least one indicator confirming that a portion of a copy-on-write sequence associated with the first write request has been completed, the portion being indicated by the at least one indicator; forming a representation of at least one storage unit as it existed prior to the storage transaction failure based at least in part on the at least one indicator and data recorded in the partially completed copy-on-write sequence; identifying a release indicator confirming that payload data specified by the first write request was written to a data store beginning at a first address; and removing information associated with a second write request from at least one queue of a standby processor module based on the release indicator, wherein the standby processor module assumes the tasks of a primary processor module in response to the storage transaction failure.
-
Specification