Deduplicated data distribution techniques
First Claim
1. A method for a deduplication-based reconstruction of file system data, the method comprising operations performed by at least one processor of a first computing system, and the operations including:
- transmitting, to a second computing system, a request for metadata of a desired file;
receiving, from the second computing system, the metadata of the desired file, the metadata of the desired file indicating respective identifiers of each block of the desired file;
determining whether respective blocks of the desired file are not in a data store associated with the first computing system by using a portion of a hash value of the respective blocks as a key and comparing the key to a partial block index, wherein the portion of the hash value is a subset of the hash value and wherein the portion of the hash value is based on the respective identifiers;
in response to determining that at least one of the respective blocks are not in the data store, determining whether remaining blocks of the respective blocks are in the data store by using the hash value of the respective blocks of the remaining blocks as a full key and comparing the full key to a full block index;
identifying, with use of the metadata of the desired file, at least one block of the desired file on the data store associated with the first computing system, the at least one block identified as being in the partial block index and in the full block index; and
reconstructing the desired file with use of the at least one block of the desired file on the data store, and with use of the metadata of the desired file received from the second computing system.
4 Assignments
0 Petitions
Accused Products
Abstract
In connection with a data distribution architecture, client-side “deduplication” techniques may be utilized for data transfers occurring among various file system nodes. In some examples, these deduplication techniques involve fingerprinting file system elements that are being shared and transferred, and dividing each file into separate units referred to as “blocks” or “chunks.” These separate units may be used for independently rebuilding a file from local and remote collections, storage locations, or sources. The deduplication techniques may be applied to data transfers to prevent unnecessary data transfers, and to reduce the amount of bandwidth, processing power, and memory used to synchronize and transfer data among the file system nodes. The described deduplication concepts may also be applied for purposes of efficient file replication, data transfers, and file system events occurring within and among networks and file system nodes.
25 Citations
20 Claims
-
1. A method for a deduplication-based reconstruction of file system data, the method comprising operations performed by at least one processor of a first computing system, and the operations including:
-
transmitting, to a second computing system, a request for metadata of a desired file; receiving, from the second computing system, the metadata of the desired file, the metadata of the desired file indicating respective identifiers of each block of the desired file; determining whether respective blocks of the desired file are not in a data store associated with the first computing system by using a portion of a hash value of the respective blocks as a key and comparing the key to a partial block index, wherein the portion of the hash value is a subset of the hash value and wherein the portion of the hash value is based on the respective identifiers; in response to determining that at least one of the respective blocks are not in the data store, determining whether remaining blocks of the respective blocks are in the data store by using the hash value of the respective blocks of the remaining blocks as a full key and comparing the full key to a full block index; identifying, with use of the metadata of the desired file, at least one block of the desired file on the data store associated with the first computing system, the at least one block identified as being in the partial block index and in the full block index; and reconstructing the desired file with use of the at least one block of the desired file on the data store, and with use of the metadata of the desired file received from the second computing system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. At least one machine-readable medium that is not a transitory propagating signal, the medium comprising instructions that, when executed by hardware of a computing device, cause the computing device to perform operations including:
-
receiving, from a source remote to the computing device, a metadata of a desired file, the metadata indicating respective identifiers of each block of the desired file; determining whether respective blocks of the desired file are not in a data store associated with the first computing system by using a portion of a hash value of the respective blocks as a key and comparing the key to a partial block index, wherein the portion of the hash value is a subset of the hash value and wherein the portion of the hash value is based on the respective identifiers; in response to determining that at least one of the respective blocks are not in the data store, determining whether remaining blocks of the respective blocks are in the data store by using the hash value of the respective blocks of the remaining blocks as a full key and comparing the full key to a full block index; identifying, with use of the respective identifiers, at least one block of the desired file on the data store, the at least one block identified as being in the partial block index and in the full block index; and reconstructing the desired file with use of the at least one block of the desired file on the data store, and with use of the metadata of the desired file received from the source remote to the computing device. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A computing device, comprising:
-
a local data store, the local data store o store plurality of file system elements including a plurality of files; and a processor and a memory, wherein the processor executes instructions to; process metadata of a particular file, the metadata indicating respective identifiers of each block of the particular file; determine whether respective blocks of the desired file are not in a data store associated with the first computing system by using a portion of a hash value of the respective blocks as a key and comparing the key to a partial block index, wherein the portion of the hash value is a subset of the hash value and wherein the portion of the hash value is based on the respective identifiers; in response to determining that at least one of the respective blocks are not in the data store, determine whether remaining blocks of the respective blocks are in the data store by using the hash value of the respective blocks of the remaining blocks as a full key and comparing the full key to a full block index; identify, with use of the metadata, at least one block of the particular file on the local data store from at least one file of the plurality of files, the at least one block identified as being in the partial block index and in the full block index; and retrieve, with use of the metadata, at least one other block of the particular file from a remote data store. - View Dependent Claims (17, 18, 19, 20)
-
Specification