De-duplicating distributed file system using cloud-based object store
First Claim
Patent Images
1. A method of storing file system data, comprising:
- receiving, at a processor, a request to store a file comprising a plurality of chunks of file data;
determining, by the processor, to store at least a first subset of the plurality of chunks of file data based at least in part on a chunk identifier, wherein a second subset of the plurality of chunks of file data are already stored at a remote storage;
determining, by the processor, a deduplication chunk size for the plurality of chunks of file data of the first subset, wherein the deduplication chunk size facilitates achieving a desired deduplication performance when storing the first subset of the plurality of chunks of file data, and wherein the deduplication chunk size is larger than a chunk size of a chunk included in the first subset of the plurality of chunks;
selecting, by the processor, which chunks of the first subset of the plurality of chunks of file data to combine into a single stored object that satisfies the deduplication chunk size associated with the desired deduplication performance;
combining, by the processor, the selected chunks of the first subset of the plurality of chunks of file data into the single stored object satisfying the deduplication chunk size; and
providing, by the processor, the single stored object that includes the combined selected chunks of the first subset of the plurality of chunks of file data to the remote storage, wherein the remote storage is configured to store the provided single stored object, and wherein storing the single stored object achieves a better deduplication performance than would be achieved if the combined selected chunks were stored individually.
9 Assignments
0 Petitions
Accused Products
Abstract
Techniques to provide a de-duplicating distributed file system using a cloud-based object store are disclosed. In various embodiments, a request to store a file comprising a plurality of chunks of file data is received. A determination to store at least a subset of the plurality of chunks is made. The request is responded to at least in part by providing an indication to store two or more chunks comprising the at least a subset of the plurality of chunks comprising the file as a single stored object that includes the combined chunk data of said two or more chunks.
-
Citations
20 Claims
-
1. A method of storing file system data, comprising:
-
receiving, at a processor, a request to store a file comprising a plurality of chunks of file data; determining, by the processor, to store at least a first subset of the plurality of chunks of file data based at least in part on a chunk identifier, wherein a second subset of the plurality of chunks of file data are already stored at a remote storage; determining, by the processor, a deduplication chunk size for the plurality of chunks of file data of the first subset, wherein the deduplication chunk size facilitates achieving a desired deduplication performance when storing the first subset of the plurality of chunks of file data, and wherein the deduplication chunk size is larger than a chunk size of a chunk included in the first subset of the plurality of chunks; selecting, by the processor, which chunks of the first subset of the plurality of chunks of file data to combine into a single stored object that satisfies the deduplication chunk size associated with the desired deduplication performance; combining, by the processor, the selected chunks of the first subset of the plurality of chunks of file data into the single stored object satisfying the deduplication chunk size; and providing, by the processor, the single stored object that includes the combined selected chunks of the first subset of the plurality of chunks of file data to the remote storage, wherein the remote storage is configured to store the provided single stored object, and wherein storing the single stored object achieves a better deduplication performance than would be achieved if the combined selected chunks were stored individually. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 20)
-
-
14. A system to store file system data, comprising:
-
a communication interface; and a processor coupled to the communication interface and configured to; receive via the communication interface a request to store a file comprising a plurality of chunks of file data; determine to store at least a first subset of the plurality of chunks of file data based at least in part on a chunk identifier, wherein a second subset of the plurality of chunks of file data are already stored at a remote storage; determine a deduplication chunk size for the plurality of chunks of file data of the first subset, wherein the deduplication chunk size facilitates achieving a desired deduplication performance when storing the first subset of the plurality of chunks of file data, and wherein the deduplication chunk size is larger than a chunk size of a chunk included in the first subset of the plurality of chunks; select which chunks of the first subset of the plurality of chunks of file data to combine into a single stored object that satisfies the deduplication chunk size associated with the desired deduplication performance; combine the selected chunks of the first subset of the plurality of chunks of file data into the single stored object satisfying the deduplication chunk size; and provide the single stored object that includes the combined selected chunks of the first subset of the plurality of chunks of file data to the remote storage, wherein the remote storage is configured to store the provided single stored object, and wherein storing the single stored object achieves a better deduplication performance than would be achieved if the combined selected chunks were stored individually. - View Dependent Claims (15, 16, 17, 18)
-
-
19. A computer program product to store file system data, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for:
-
receiving a request to store a file comprising a plurality of chunks of file data; determining to store at least a first subset of the plurality of chunks of file data based at least in part on a chunk identifier, wherein a second subset of the plurality of chunks of file data are already stored at a remote storage; determining a deduplication chunk size for the plurality of chunks of file data of the first subset, wherein the deduplication chunk size facilitates achieving a desired deduplication performance when storing the first subset of the plurality of chunks of file data, and wherein the deduplication chunk size is larger than a chunk size of a chunk included in the first subset of the plurality of chunks; selecting which chunks of the first subset of the plurality of chunks of file data to combine into a single stored object that satisfies the deduplication chunk size associated with the desired deduplication performance; combining the selected chunks of the first subset of the plurality of chunks of file data into the single stored object satisfying the deduplication chunk size; and providing the single stored object that includes the combined selected chunks of the first subset of the plurality of chunks of file data to the remote storage, wherein the remote storage is configured to store the provided single stored object, and wherein storing the single stored object achieves a better deduplication performance than would be achieved if the combined selected chunks were stored individually.
-
Specification