Data aware deduplication object storage (DADOS)
First Claim
1. A data aware deduplicating object store, comprising:
- a collection of storage devices, wherein each device includes a processor and non-transitory storage medium storing instructions that cause the processor to perform corresponding functions;
a consistent hashing logic configured to;
receive a first item having a first item identifier;
break data of the item into at least two chunks;
perform a first hash function on each chunk to generate a recipe comprising respective chunk identifiers of the at least two chunks;
perform a second hash function on the item identifier to determine a metadata location on a metadata ring of a dual ring architecture;
store the recipe in the metadata location;
provide respective chunk identifiers and chunks for storing in a bulk ring of the dual ring architecture; and
a deduplication logic configured to;
receive the chunk identifiers and chunks from the consistent hashing logic;
access a plurality of Bloom filter shards, wherein each Bloom filter shard stores information about chunks of data stored in an associated key/value data store of the bulk ring, to perform deduplication on each chunk based on the chunk identifier for the chunk to determine whether the chunk is a duplicate chunk already present in the bulk ring;
increment a respective reference count for each respective chunk identifier based on the recipe including the respective chunk, wherein the reference count facilitates garbage collection or data reclamation; and
when the chunk of data is a duplicate chunk, refrain from providing the duplicate chunk to the bulk ring for storing.
7 Assignments
0 Petitions
Accused Products
Abstract
Embodiments include a data aware deduplicating object store. The data aware deduplicating data store includes a consistent hashing logic that manages a consistent hashing architecture for the object store. The consistent hashing architecture includes a metadata ring and a bulk ring. The consistent hashing architecture may be a multiple ring architecture comprising a metadata ring and two or more bulk rings. A bulk ring may include a key/value (k/v) data store, where a k/v data store stores a shard of an index and a reference count that facilitates the individual approach to garbage collection or data reclamation. The data aware deduplicating data store also includes a deduplication logic that provides data deduplication for data to be stored in the object store. The deduplication logic performs variable length deduplication and provides a shared nothing approach.
28 Citations
23 Claims
-
1. A data aware deduplicating object store, comprising:
-
a collection of storage devices, wherein each device includes a processor and non-transitory storage medium storing instructions that cause the processor to perform corresponding functions; a consistent hashing logic configured to; receive a first item having a first item identifier; break data of the item into at least two chunks; perform a first hash function on each chunk to generate a recipe comprising respective chunk identifiers of the at least two chunks; perform a second hash function on the item identifier to determine a metadata location on a metadata ring of a dual ring architecture; store the recipe in the metadata location; provide respective chunk identifiers and chunks for storing in a bulk ring of the dual ring architecture; and a deduplication logic configured to; receive the chunk identifiers and chunks from the consistent hashing logic; access a plurality of Bloom filter shards, wherein each Bloom filter shard stores information about chunks of data stored in an associated key/value data store of the bulk ring, to perform deduplication on each chunk based on the chunk identifier for the chunk to determine whether the chunk is a duplicate chunk already present in the bulk ring; increment a respective reference count for each respective chunk identifier based on the recipe including the respective chunk, wherein the reference count facilitates garbage collection or data reclamation; and when the chunk of data is a duplicate chunk, refrain from providing the duplicate chunk to the bulk ring for storing. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage device storing computer-executable instructions that when executed by a computer control the computer to perform a method, the method comprising:
-
accessing a first set of electronic data to be stored in a data aware deduplicating object store comprising a collection of storage devices, where the data aware deduplicating data store comprises a consistent hashing architecture, where the consistent hashing architecture is a dual-ring architecture comprising a metadata ring and a bulk ring, where the metadata ring comprises a first set of key/value (k/v) nodes and the bulk ring comprises a second set of k/v nodes; chunking the first set of electronic data into a set of data chunks using a variable length deduplication approach; generating a set of hashed chunk and chunk identifier pairs by hashing each member of the set of data chunks to generate a chunk identifier for the chunk, wherein the chunk identifiers comprise a recipe for the first set of electronic data; hashing an identifier for the first set of electronic data to generate a metadata location on the metadata ring; storing the recipe in the metadata location; providing the set of hashed chunks for storing in the bulk ring; access a plurality of Bloom filter shards, wherein each Bloom filter shard stores information about chunks of data stored in an associated key/value data store of the bulk ring, to determine whether the hashed chunk has already been stored in the bulk ring; incrementing a respective reference count for each respective chunk identifier based on the recipe including the respective chunk, wherein the reference count facilitates garbage collection or data reclamation; and refraining from storing a hashed chunk of the set in the bulk ring upon determining that the hashed chunk has already been stored in the bulk ring. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A system for data aware deduplication object storage, the system comprising:
-
an object store; means for managing a consistent hashing architecture associated with the object store, where the consistent hashing architecture is a multiple ring architecture comprising at least one metadata ring and at least one bulk ring, where the at least one metadata ring includes one or more key/value (k/v) data stores, and where the at least one bulk ring includes one or more k/v data stores; means for receiving a first item having a first item identifier; means for breaking the item into at least two chunks; means for hashing each chunk to generate respective chunk identifiers, wherein the chunk identifiers comprise a recipe for the first item; means for hashing the item identifier to determine a metadata location on the metadata ring; means for storing the recipe in the metadata location; means for providing the chunk identifiers and the chunks for storing in the bulk ring at a bulk ring location; means for deduplicating data to be stored in the object store based on the chunk identifiers using a variable length deduplication approach; means for storing deduplicated data in the bulk ring; wherein the means for deduplicating is configured to refrain from providing duplicate data to the means for storing deduplicated data in the bulk ring; and means for power management of a k/v data store or garbage collection for the k/v data store at an individual level.
-
Specification