Log structured content addressable deduplicating storage
First Claim
1. A method of storing data in a data storage system, the method comprising:
- identifying a first storage label associated with a reference operation, wherein the first storage label is associated with a first previously stored data segment;
generating a first transaction record including an identifier associated with the storage operation;
storing the first transaction record in a transaction log data structure;
searching a label cache to locate first label metadata matching the first storage label;
in response to locating the first label metadata matching the first storage label in the label cache, changing a reference count included in the first label metadata, wherein the reference operation includes a reference count that specifies the number of occurrences of the first storage label referencing the first data segment;
in response to not locating the first label metadata matching the first storage label in the label cache;
generating a second transaction record indicating that the label metadata matching the storage label is not located in the label cache;
storing the second transaction record in the transaction log data structure;
searching the transaction log data structure to identify a plurality of transaction records, including the second transaction record, that indicate that a set of label metadata, including the first label metadata, corresponding with a plurality of storage labels, including the first storage label, are not located in the label cache;
searching at least one label metadata archive to locate each of the set of the label metadata matching the each of the plurality of storage labels, including the first label metadata matching the first storage label;
in response to locating the first label metadata matching the first storage label in the at least one label metadata archive, changing the reference count included in the first label metadata; and
generating a third transaction record including the identifier, wherein the third transaction record is adapted to indicate that the reference operation is complete; and
storing the third transaction record in the transaction log data structure.
19 Assignments
0 Petitions
Accused Products
Abstract
A log structured content addressable deduplicated data storage system may be used to store deduplicated data. Data to be stored is partitioned into data segments. Each unique data segment is associated with a label. The storage system maintains a transaction log. Mutating storage operations are initiated by storing transaction records in the transaction log. Additional transaction records are stored in the log when storage operations are completed. Upon restarting an embodiment of the data storage system, the transaction records from the transaction logs are replayed to recreate the state of the data storage system. The data storage system updates file system metadata with transaction information while a storage operation associated with the file is being processed. This transaction information serves as atomically updated transaction commit points, allowing fully internally consistent snapshots of deduplicated volumes to be taken at any time.
81 Citations
17 Claims
-
1. A method of storing data in a data storage system, the method comprising:
-
identifying a first storage label associated with a reference operation, wherein the first storage label is associated with a first previously stored data segment; generating a first transaction record including an identifier associated with the storage operation; storing the first transaction record in a transaction log data structure; searching a label cache to locate first label metadata matching the first storage label; in response to locating the first label metadata matching the first storage label in the label cache, changing a reference count included in the first label metadata, wherein the reference operation includes a reference count that specifies the number of occurrences of the first storage label referencing the first data segment; in response to not locating the first label metadata matching the first storage label in the label cache; generating a second transaction record indicating that the label metadata matching the storage label is not located in the label cache; storing the second transaction record in the transaction log data structure; searching the transaction log data structure to identify a plurality of transaction records, including the second transaction record, that indicate that a set of label metadata, including the first label metadata, corresponding with a plurality of storage labels, including the first storage label, are not located in the label cache; searching at least one label metadata archive to locate each of the set of the label metadata matching the each of the plurality of storage labels, including the first label metadata matching the first storage label; in response to locating the first label metadata matching the first storage label in the at least one label metadata archive, changing the reference count included in the first label metadata; and generating a third transaction record including the identifier, wherein the third transaction record is adapted to indicate that the reference operation is complete; and storing the third transaction record in the transaction log data structure. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
Specification