De-duplication data bank
First Claim
1. A method for minimizing network usage during data transfer over a network between a source node and a destination node, the method comprising:
- pregenerating a plurality of data chunks at the destination node, the plurality of data chunks comprising a first data chunk;
storing the plurality of data chunks in a computer readable medium at the destination node;
generating a plurality of hash values of the first data chunk based on different traversal orders of the first data chunk at the destination node;
storing the plurality of hash values and specifications of the traversal orders in a data store at the destination node; and
reconstituting, at the destination node, a second data chunk identical to a source data chunk without receiving the source data chunk from the source node, thereby minimizing network usage, wherein reconstituting comprises;
receiving a source hash value from the source node, the source hash value being a hash of the source data chunk;
determining that the source hash value is present among the plurality of hash values in the data store at the destination node, wherein determining comprises comparing the source hash value to the plurality of hash values at the destination node to determine when one of the plurality of hash values matches the source hash value;
creating the second data chunk based on the source hash value and the specifications of traversal order in the data store.
5 Assignments
0 Petitions
Accused Products
Abstract
Facility for transferring data over a network between two network endpoints by transferring hash signatures over the network instead the actual data. The hash signatures are pre-generated from local static data and stored in a hash database before any data is transferred between source and destination. The hash signatures are created on both sides of a network at the point where data is local, and the hash database consists of hash signatures of blocks of data that are stored locally. The hash signatures are created using different traversal patterns across local data so that the hash database can represent a larger dataset then the actual physical storage of the local data. If no local data is present, then arbitrary data is generated and then remains static.
-
Citations
15 Claims
-
1. A method for minimizing network usage during data transfer over a network between a source node and a destination node, the method comprising:
-
pregenerating a plurality of data chunks at the destination node, the plurality of data chunks comprising a first data chunk; storing the plurality of data chunks in a computer readable medium at the destination node; generating a plurality of hash values of the first data chunk based on different traversal orders of the first data chunk at the destination node; storing the plurality of hash values and specifications of the traversal orders in a data store at the destination node; and reconstituting, at the destination node, a second data chunk identical to a source data chunk without receiving the source data chunk from the source node, thereby minimizing network usage, wherein reconstituting comprises; receiving a source hash value from the source node, the source hash value being a hash of the source data chunk; determining that the source hash value is present among the plurality of hash values in the data store at the destination node, wherein determining comprises comparing the source hash value to the plurality of hash values at the destination node to determine when one of the plurality of hash values matches the source hash value; creating the second data chunk based on the source hash value and the specifications of traversal order in the data store. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product for minimizing network usage during data transfer over a network between a source node and a destination node, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
-
pregenerate a plurality of data chunks at the destination node, the plurality of data chunks comprising a first data chunk; store the plurality of data chunks in a computer readable medium at the destination node; generate a plurality of hash values of the first data chunk based on different traversal orders of the first data chunk at the destination node; store the plurality of hash values and specifications of the traversal orders in a data store at the destination node; and reconstitute, at the destination node, a second data chunk identical to a source data chunk without receiving the source data chunk from the source node, thereby minimizing network usage, wherein reconstituting comprises; receive a source hash value from the source node, the source hash value being a hash of the source data chunk; determine that the source hash value is present among the plurality of hash values in the data store at the destination node, wherein determining comprises comparing the source hash value to the plurality of hash values at the destination node to determine when one of the plurality of hash values matches the source hash value; create the second data chunk based on the source hash value and the specifications of traversal order in the data store. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system for minimizing network usage during data transfer over a network, the system comprising:
-
a source node; and a destination node in communication with the source node via the network, the destination node comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to; pregenerate a plurality of data chunks at the destination node, the plurality of data chunks comprising a first data chunk; store the plurality of data chunks in a computer readable medium at the destination node; generate a plurality of hash values of the first data chunk based on different traversal orders of the first data chunk at the destination node; store the plurality of hash values and specifications of the traversal orders in a data store at the destination node; and reconstitute, at the destination node, a second data chunk identical to a source data chunk without receiving the source data chunk from the source node, thereby minimizing network usage, wherein reconstituting comprises; receive a source hash value from the source node, the source hash value being a hash of the source data chunk; determine that the source hash value is present among the plurality of hash values in the data store at the destination node, wherein determining comprises comparing the source hash value to the plurality of hash values at the destination node to determine when one of the plurality of hash values matches the source hash value; create the second data chunk based on the source hash value and the specifications of traversal order in the data store.
-
Specification