Log structured content addressable deduplicating storage
First Claim
1. A method of modifying data in a data storage system, the method comprising:
- identifying a storage label and storage data associated with a storage operation;
generating a first transaction record indicating an initiation of the storage operation, wherein the first transaction record includes a transaction identifier;
storing the first transaction record including the transaction identifier in a transaction log data structure;
modifying a portion of the data storage system associated with the storage label and the storage data based on the storage operation;
identifying a file system file associated with the portion of the data storage system;
performing an atomic file system operation on the file system file to store the transaction identifier in file system metadata, wherein the file system metadata is separate from the transaction log data structure;
determining that the storage operation was successfully committed if (1) the first transaction record exists in the transaction log data structure, but a second transaction record indicating that the storage operation was completed does not exist in the transaction log data structure, (2) the first transaction record includes the transaction identifier, and (3) the file system metadata also includes the transaction identifier; and
determining that the storage operation was unsuccessfully committed if (1) the first transaction record exists in a transaction log data structure, but the second transaction record indicating that the storage system operation was completed does not exist in the transaction log data structure, (2) the first transaction record includes the transaction identifier, and (3) the file system metadata does not include the transaction identifier.
18 Assignments
0 Petitions
Accused Products
Abstract
A log structured content addressable deduplicated data storage system may be used to store deduplicated data. Data to be stored is partitioned into data segments. Each unique data segment is associated with a label. The storage system maintains a transaction log. Mutating storage operations are initiated by storing transaction records in the transaction log. Additional transaction records are stored in the log when storage operations are completed. Upon restarting an embodiment of the data storage system, the transaction records from the transaction logs are replayed to recreate the state of the data storage system. The data storage system updates file system metadata with transaction information while a storage operation associated with the file is being processed. This transaction information serves as atomically updated transaction commit points, allowing fully internally consistent snapshots of deduplicated volumes to be taken at any time.
22 Citations
16 Claims
-
1. A method of modifying data in a data storage system, the method comprising:
-
identifying a storage label and storage data associated with a storage operation; generating a first transaction record indicating an initiation of the storage operation, wherein the first transaction record includes a transaction identifier; storing the first transaction record including the transaction identifier in a transaction log data structure; modifying a portion of the data storage system associated with the storage label and the storage data based on the storage operation; identifying a file system file associated with the portion of the data storage system; performing an atomic file system operation on the file system file to store the transaction identifier in file system metadata, wherein the file system metadata is separate from the transaction log data structure; determining that the storage operation was successfully committed if (1) the first transaction record exists in the transaction log data structure, but a second transaction record indicating that the storage operation was completed does not exist in the transaction log data structure, (2) the first transaction record includes the transaction identifier, and (3) the file system metadata also includes the transaction identifier; and determining that the storage operation was unsuccessfully committed if (1) the first transaction record exists in a transaction log data structure, but the second transaction record indicating that the storage system operation was completed does not exist in the transaction log data structure, (2) the first transaction record includes the transaction identifier, and (3) the file system metadata does not include the transaction identifier. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of restoring a data storage system, the method comprising:
-
accessing a transaction log data structure including transaction records associated with storage system operations; identifying incomplete storage system operations from the transaction records, wherein the transaction records associated with the incomplete storage system operations include transaction identifiers; analyzing file system metadata of the file system files associated with the incomplete storage system operations to identify at least a portion of the file system files having file system metadata including transaction identifiers matching the transaction identifiers of at least a portion of the incomplete storage system records, wherein the presence of a matching transaction identifier in one of the portion of the file system files indicates that the corresponding one of the incomplete storage system operations has been previously committed, wherein the file system metadata is separate from the transaction log data structure; adding a portion of the transaction records associated with the portion of the incomplete storage system records to a list of storage operations to be reprocessed; and reprocessing a portion of the storage system operations corresponding with the transaction records included in the list of storage operations to reconstruct a prior state of the data storage system; wherein an incomplete storage system operation is considered to be successfully committed if (1) a first transaction record exists in a transaction log data structure, but a second transaction record indicating that the storage system operation was completed does not exist in the transaction log data structure, (2) the first transaction record includes a transaction identifier, and (3) the file system metadata also includes the transaction identifier; and wherein the incomplete storage system operation is considered to be unsuccessfully committed if (1) the first transaction record exists in a transaction log data structure, but the second transaction record indicating that the storage system operation was completed does not exist in the transaction log data structure, (2) the first transaction record includes the transaction identifier, and (3) the file system metadata does not include the transaction identifier. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
Specification