Storage of Data In A Distributed Storage System
First Claim
1. A method of storing data for files, implemented on one or more servers, having memory and one or more processors storing one or more programs for execution by the one or more processors, the method comprising:
- receiving a first blob of data;
splitting the first blob of data into one or more first chunks of data;
computing a content fingerprint for respective first chunks of data;
storing the first chunks of data in a chunk store;
storing the content fingerprints of the first chunks of data in a store distinct from the chunk store;
receiving a second blob of data;
splitting the second blob of data into one or more second chunks of data;
computing a content fingerprint for respective second chunks of data;
for a respective second chunk of data whose content fingerprint matches a content fingerprint of a first chunk of data;
storing a second reference to the corresponding first chunk of data that has a matching content fingerprint; and
not storing the second chunk of data; and
for each second chunk of data whose content fingerprint does not match a content fingerprint of a first chunk of data;
storing the second chunk of data in a chunk store.
2 Assignments
0 Petitions
Accused Products
Abstract
A distributed storage system stores data for files. A first blob (binary large object) of data is received. The first blob is split into one or more first chunks of data. Content fingerprints for the first chunks of data are computed. The first chunks of data are stored in a chunk store while and their content fingerprints are stored in a store distinct from the chunk store. A second blob of data is received. The second blob is split into one or more second chunks of data. Content fingerprints for the second chunks of data are computed. Then for a second chunk of data whose content fingerprint matches a content fingerprint of a first chunk of data, a second reference to the corresponding first chunk of data that has a matching content fingerprint is stored, but the second chunk of data is not stored.
243 Citations
2 Claims
-
1. A method of storing data for files, implemented on one or more servers, having memory and one or more processors storing one or more programs for execution by the one or more processors, the method comprising:
-
receiving a first blob of data; splitting the first blob of data into one or more first chunks of data; computing a content fingerprint for respective first chunks of data; storing the first chunks of data in a chunk store; storing the content fingerprints of the first chunks of data in a store distinct from the chunk store; receiving a second blob of data; splitting the second blob of data into one or more second chunks of data; computing a content fingerprint for respective second chunks of data; for a respective second chunk of data whose content fingerprint matches a content fingerprint of a first chunk of data; storing a second reference to the corresponding first chunk of data that has a matching content fingerprint; and not storing the second chunk of data; and for each second chunk of data whose content fingerprint does not match a content fingerprint of a first chunk of data; storing the second chunk of data in a chunk store.
-
-
2. A method of storing data for files, implemented on one or more servers, having memory and one or more processors storing one or more programs for execution by the one or more processors, the method comprising:
-
receiving a first representation of a blob of data having a specified first representation type; storing the first representation of the blob of data; storing metadata for the blob of data, including a name of the blob, the representation type, and a storage location for the first representation of the blob; receiving a request to create a second representation of the blob with a second representation type; creating a second representation of the blob having the second representation type; storing the second representation of the blob of data; updating the metadata for the blob of data to indicate the presence of the second representation of the blob with the second representation type; receiving a request from a client for a copy of the blob, wherein the request includes a specified representation type; retrieving either the first representation of the blob or the second representation of the blob, the retrieved representation of the blob corresponding to the representation type requested by the client; and sending the retrieved representation of the blob to the client.
-
Specification