Deduplicated data distribution techniques

US 9,916,206 B2
Filed: 12/31/2014
Issued: 03/13/2018
Est. Priority Date: 09/30/2014
Status: Active Grant

First Claim

Patent Images

1. A method for a deduplication-based reconstruction of file system data, the method comprising operations performed by at least one processor of a first computing system, and the operations including:

transmitting, to a second computing system, a request for metadata of a desired file;

receiving, from the second computing system, the metadata of the desired file, the metadata of the desired file indicating respective identifiers of each block of the desired file;

determining whether respective blocks of the desired file are not in a data store associated with the first computing system by using a portion of a hash value of the respective blocks as a key and comparing the key to a partial block index, wherein the portion of the hash value is a subset of the hash value and wherein the portion of the hash value is based on the respective identifiers;

in response to determining that at least one of the respective blocks are not in the data store, determining whether remaining blocks of the respective blocks are in the data store by using the hash value of the respective blocks of the remaining blocks as a full key and comparing the full key to a full block index;

identifying, with use of the metadata of the desired file, at least one block of the desired file on the data store associated with the first computing system, the at least one block identified as being in the partial block index and in the full block index; and

reconstructing the desired file with use of the at least one block of the desired file on the data store, and with use of the metadata of the desired file received from the second computing system.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In connection with a data distribution architecture, client-side “deduplication” techniques may be utilized for data transfers occurring among various file system nodes. In some examples, these deduplication techniques involve fingerprinting file system elements that are being shared and transferred, and dividing each file into separate units referred to as “blocks” or “chunks.” These separate units may be used for independently rebuilding a file from local and remote collections, storage locations, or sources. The deduplication techniques may be applied to data transfers to prevent unnecessary data transfers, and to reduce the amount of bandwidth, processing power, and memory used to synchronize and transfer data among the file system nodes. The described deduplication concepts may also be applied for purposes of efficient file replication, data transfers, and file system events occurring within and among networks and file system nodes.

25 Citations

View as Search Results

20 Claims

1. A method for a deduplication-based reconstruction of file system data, the method comprising operations performed by at least one processor of a first computing system, and the operations including:
- transmitting, to a second computing system, a request for metadata of a desired file;
  
  receiving, from the second computing system, the metadata of the desired file, the metadata of the desired file indicating respective identifiers of each block of the desired file;
  
  determining whether respective blocks of the desired file are not in a data store associated with the first computing system by using a portion of a hash value of the respective blocks as a key and comparing the key to a partial block index, wherein the portion of the hash value is a subset of the hash value and wherein the portion of the hash value is based on the respective identifiers;
  
  in response to determining that at least one of the respective blocks are not in the data store, determining whether remaining blocks of the respective blocks are in the data store by using the hash value of the respective blocks of the remaining blocks as a full key and comparing the full key to a full block index;
  
  identifying, with use of the metadata of the desired file, at least one block of the desired file on the data store associated with the first computing system, the at least one block identified as being in the partial block index and in the full block index; and
  
  reconstructing the desired file with use of the at least one block of the desired file on the data store, and with use of the metadata of the desired file received from the second computing system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein each block of the desired file is stored in a matching source file on the data store associated with the first computing system, the operations of reconstructing the desired file including:
    - retrieving each block of the desired file from the matching source file on the data store associated with the first computing system; and
      
      writing each block retrieved from the matching source file to a file system location in a destination data store associated with the first computing system.
  - 3. The method of claim 1, wherein at least one block of the desired tile is not stored in the data store associated with the first computing system, the operations of reconstructing the desired file includingtransmitting, to the second computing system, a request for the at least one block of the desired file that is not stored in the data store associated with the first computing system, the request indicating respective identifiers of the at least one block of the desired file that is not stored in the data store associated with the first computing system;
    - receiving, from the second computing system, the at least one block of the desired file that is not stored in the data store associated with the first computing system;
      
      writing the at least one block of the desired file received from the second computing system to a file system location in a destination data store associated with the first computing system;
      
      retrieving at least one remaining block of the desired file from a source file stored in the data store associated with the first computing system; and
      
      writing the at least one remaining block of the desired file to the file system location in a destination data store associated with the first computing system.
  - 4. The method of claim 1, the operations performed by the at least one processor of the first computing system including:
    - obtaining the at least one block of the desired file from the data store associated with the first computing system;
      
      determining, from an index associated with the first computing system, that at least one other block of the desired file is not stored on the data store associated with the first computing system, the determining performed using the metadata of the desired file received from the second computing system; and
      
      obtaining the at least one other block of the desired file from the second computing system;
      
      wherein the operations of reconstructing the desired file include reconstructing the desired file from the at least one block of the desired file obtained from the data store associated with the first computing system and the at least one other block of the desired file obtained from the second computing system.
  - 5. The method of claim 4, wherein the determining that the at least one other block of the desired file is not stored on the data store associated with the first computing system is performed with use of a bloom filter cache, wherein the bloom filter cache operates upon at least a portion of the respective identifiers of each block of the desired file.
  - 6. The method of claim 1, wherein identifying the at least one block of the desired file is performed with use of a block index, the block index providing respective identifiers of a plurality of blocks located within files stored on the data store associated with the first computing system.
  - 7. The method of claim 6, the operations performed by the at least one processor of the first computing system including:
    - validating the desired file in response to the reconstructing, the validating performing a comparison of a digital signature of the desired file that is provided from the reconstructing with a digital signature of the desired file that is provided from the metadata of the desired file.
  - 8. The method of claim 7, wherein the respective identifiers of each block of the desired file are based at least in part on an MD5 hash value determined for a respective block and wherein the digital signature of the desired file is based at least in part on an SHA-2 hash value determined for a respective file.

9. At least one machine-readable medium that is not a transitory propagating signal, the medium comprising instructions that, when executed by hardware of a computing device, cause the computing device to perform operations including:
- receiving, from a source remote to the computing device, a metadata of a desired file, the metadata indicating respective identifiers of each block of the desired file;
  
  determining whether respective blocks of the desired file are not in a data store associated with the first computing system by using a portion of a hash value of the respective blocks as a key and comparing the key to a partial block index, wherein the portion of the hash value is a subset of the hash value and wherein the portion of the hash value is based on the respective identifiers;
  
  in response to determining that at least one of the respective blocks are not in the data store, determining whether remaining blocks of the respective blocks are in the data store by using the hash value of the respective blocks of the remaining blocks as a full key and comparing the full key to a full block index;
  
  identifying, with use of the respective identifiers, at least one block of the desired file on the data store, the at least one block identified as being in the partial block index and in the full block index; and
  
  reconstructing the desired file with use of the at least one block of the desired file on the data store, and with use of the metadata of the desired file received from the source remote to the computing device.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The machine-readable medium of claim 9, comprising instructions, that when executed, further cause the computing device to perform operations including:
    - transmitting, to the source remote to the computing device, a request for the metadata of the desired file, wherein the metadata of the desired file is received in response to the request; and
      
      retrieving the at least one block of the desired file from the data store.
  - 11. The machine-readable medium of claim 9, wherein each block of the desired file is stored in a matching source file on the data store, and wherein the instructions that cause the computing device to reconstruct the desired file include operations including:
    - retrieving each block of the desired file from the matching source file on the data store; and
      
      writing each block retrieved from the matching source file to a file system location in a destination data store of the computing device.
  - 12. The machine-readable medium of claim 9, wherein at least one block of the desired file is not stored in the data store, and wherein the operations to reconstruct the desired file include:
    - transmitting a request for the at least one block of the desired file that is not stored in the data store, the request indicating respective identifiers of the at least one block of the desired file that is not stored in the data store;
      
      receiving the at least one block of the desired file that is not stored in the data store;
      
      writing the received at least one block of the desired file to a file system location in a destination data store of the computing device;
      
      retrieving at least one remaining block of the desired file from a source file located in the data store; and
      
      writing the at least one remaining block of the desired file to the file system location in a destination data store of the computing device.
  - 13. The machine-readable medium of claim 9, comprising instructions, that when executed, further cause the computing device to perform operations including:
    - obtaining the at least one block of the desired file from the data store;
      
      determining, from an index, that at least one other block of the desired file is not stored on the data store, the determining performed using the metadata of the desired file received from the source remote to the computing device; and
      
      obtaining the at least one other block of the desired file from a source other than the data store;
      
      wherein the operations of reconstructing the desired file include reconstructing the desired file from the at least one block of the desired file obtained from the data store and the at least one other block of the desired file obtained from the source other than the data store.
  - 14. The machine-readable medium of claim 13, wherein the determining that the at least one other block of the desired file are not stored on the data store is performed with use of a bloom filter cache, wherein the bloom filter cache operates upon at least a portion of the respective identifiers of each block of the desired file.
  - 15. The machine-readable medium of claim 9, wherein identifying the at least one block of the desired file is performed with use of a block index, the block index providing respective identifiers of a plurality of blocks located within files stored on the data store;
    - andthe machine-readable medium comprising instructions, that when executed, further cause the computing device to perform operations including;
      
      validating the desired file in response to the reconstructing, the validating performing a comparison of a digital signature of the desired file that is provided from the reconstructing with a digital signature of the desired file that is provided from the metadata of the desired file.

16. A computing device, comprising:
- a local data store, the local data store o store plurality of file system elements including a plurality of files; and
  
  a processor and a memory, wherein the processor executes instructions to;
  
  process metadata of a particular file, the metadata indicating respective identifiers of each block of the particular file;
  
  determine whether respective blocks of the desired file are not in a data store associated with the first computing system by using a portion of a hash value of the respective blocks as a key and comparing the key to a partial block index, wherein the portion of the hash value is a subset of the hash value and wherein the portion of the hash value is based on the respective identifiers;
  
  in response to determining that at least one of the respective blocks are not in the data store, determine whether remaining blocks of the respective blocks are in the data store by using the hash value of the respective blocks of the remaining blocks as a full key and comparing the full key to a full block index;
  
  identify, with use of the metadata, at least one block of the particular file on the local data store from at least one file of the plurality of files, the at least one block identified as being in the partial block index and in the full block index; and
  
  retrieve, with use of the metadata, at least one other block of the particular file from a remote data store.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computing device of claim 16, wherein the processor further executes instructions to:
    - to;
      
      divide the particular file of the plurality of files into at least one block; and
      
      identify the at least one block of the particular file with use of a block index, the block index providing respective identifiers of a plurality of blocks located among the plurality of files stored on the local data store.
  - 18. The computing device of claim 16, wherein the processor further executes instructions to:
    - to;
      
      transmit, to a source remote to the computing device, a request for the metadata of the particular file, wherein the metadata of the particular file is received in response to the request;
      
      transmit, to the source remote to the computing device, a request for the at least one other block of the particular file, the at least one other block of the particular file not stored in the local data store, and the request indicating respective identifiers of the at least one block of the particular file that is not stored in the local data store; and
      
      receive, from the source remote to the computing device, the at least one block of the particular file that is not stored in the local data store in response to the request for the at least one other block of the particular file.
  - 19. The computing device of claim 16, wherein the processor further executes instructions to:
    - to;
      
      reconstruct the particular file with use of the at least one block of the particular file provided from the local data store, and with use of the metadata of the particular file; and
      
      store the reconstructed particular file in the local data store.
  - 20. The computing device of claim 16, wherein the processor further executes instructions to:
    - to;
      
      request at least one data block of the particular file from a remote version data store, in response to a request for a retrieval or a storage of at least one block from a particular file version of the particular file.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Crashplan Group LLC
Original Assignee
Code 42 Software Incorporated
Inventors
Dornquast, Matthew, Bispala, Brian, Allison, Damon, Armstrong, Brad, Scorcio, Marshall, Lonergan, Rory, Lindquist, Peter, Parker, Christopher
Primary Examiner(s)
Le, Debbie

Application Number

US14/587,077
Publication Number

US 20160092312A1
Time in Patent Office

1,168 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 11/1435   using file system or storag...

G06F 11/1451   by selection of backup cont...

G06F 11/1453   using de-duplication of the...

G06F 11/1464   for networked environments

G06F 16/1748   De-duplication implemented ...

G06F 16/178   Techniques for file synchro...

G06F 16/182   Distributed file systems

G06F 16/183   Provision of network file s...

G06F 16/1844   Management specifically ada...

G06F 16/1873   Versioning file systems, te...

G06F 16/2329   using versioning

G06F 16/275   Synchronous replication

G06F 21/552   involving long-term monitor...

G06F 21/604   Tools and structures for ma...

G06F 2221/2111   Location-sensitive, e.g. ge...

G06N 20/00   Machine learning

H04L 47/821   Prioritising resource alloc...

H04L 63/0428   wherein the data content is...

H04L 67/1072   Discovery involving ranked ...

H04L 67/535   Tracking the activity of th...

Deduplicated data distribution techniques

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

25 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Deduplicated data distribution techniques

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

25 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links