Optimizing data transmission bandwidth consumption over a wide area network
First Claim
1. A method of reducing bandwidth consumption for data being sent between a client and server over a wide area network (WAN) by optimizing deduplication performance, the method comprising:
- partitioning a data message into a plurality of data chunks, said message to be sent from a client at a first data site to a server at a second data site over a WAN;
generating a data chunk identifier (ID) for each of said chunks;
determining whether said chunks are stored at the second data site;
when at least one of said chunks is not stored at the second data site, adding said ID for each of said chunks not stored at the second data site to a data structure at the first data site;
when at least two data chunks partitioned from said message are in sequence and are not stored at the second data site, linking in the data structure at the first data site, the data chunk ID for each data chunk in sequence that are not stored at the second data site in an order corresponding to the sequence in said message; and
sending a transformed data message from the first date site to the second data site, wherein said transformed message includes at least one tuple when at least one of said chunks is stored at the second data site and data chunks among said chunks that are not stored at the second data site, wherein said tuple;
is a paired representation of data chunk sequence, is to be used to reconstruct said message, and includes a first data chunk ID and a sequence count that represents an aggregate number of linked data chunk IDs.
1 Assignment
0 Petitions
Accused Products
Abstract
An exemplary embodiment includes partitioning a data message to be communicated from a first data site to a second data site into data chunks; generating a data chunk identifier for each data chunk; determining whether the data chunks are stored at the second data site; when at least one data chunk is not stored at the second data site, adding the data chunk identifier for each data chunk not stored at the second data site to a data structure at the first data site; sending a transformed data message from the first date site to the second data site; wherein, when at least one data chunk is already stored at the second data site, rather than including that data chunk, the transformed data message instead includes at least one tuple to enable the data message to be reconstructed at the second data site without sending the previously stored data chunk, the transformed data message also includes each data chunk not stored at the second data site.
30 Citations
22 Claims
-
1. A method of reducing bandwidth consumption for data being sent between a client and server over a wide area network (WAN) by optimizing deduplication performance, the method comprising:
-
partitioning a data message into a plurality of data chunks, said message to be sent from a client at a first data site to a server at a second data site over a WAN; generating a data chunk identifier (ID) for each of said chunks; determining whether said chunks are stored at the second data site; when at least one of said chunks is not stored at the second data site, adding said ID for each of said chunks not stored at the second data site to a data structure at the first data site; when at least two data chunks partitioned from said message are in sequence and are not stored at the second data site, linking in the data structure at the first data site, the data chunk ID for each data chunk in sequence that are not stored at the second data site in an order corresponding to the sequence in said message; and sending a transformed data message from the first date site to the second data site, wherein said transformed message includes at least one tuple when at least one of said chunks is stored at the second data site and data chunks among said chunks that are not stored at the second data site, wherein said tuple;
is a paired representation of data chunk sequence, is to be used to reconstruct said message, and includes a first data chunk ID and a sequence count that represents an aggregate number of linked data chunk IDs. - View Dependent Claims (2, 3, 4, 5, 6, 7, 22)
-
-
8. A method of reducing bandwidth consumption for data being sent between a client and server over a wide area network (WAN) by optimizing deduplication performance, the method comprising:
-
receiving a transformed data message at a second data site, said transformed message;
being previously sent from a client at a first data site to a server at the second data site over a WAN, created from data message partitioned into a plurality of data chunks, and includes data chunks from the data message not stored at the second data site and at least one tuple when at least one data chunk from the data message is stored at the second data site, wherein said tuple;
is a paired representation of data chunk sequence, is to be used at the second data site to reconstruct said transformed, and includes a first data chunk ID and a sequence count that represents an aggregate number of linked data chunk IDs; andwhen said transformed message includes at least one data chunk;
generating a data chunk identifier (ID) for each chunk in said transformed message, adding said ID for each chunk in said transformed message to a data structure at the second data site, and storing each chunk in said transformed message at the second data site; andreconstructing said transformed message at the second data site by; when said transformed message includes at least one data chunk, assembling said at least one chunk into a reconstructed data message in an order corresponding to an order in said transformed message, when said transformed message includes at least one tuple, assembling each chunk corresponding to a chunk ID referenced in the at least one tuple into said reconstructed message in an order the at least one tuple is in said transformed message. - View Dependent Claims (9, 10)
-
-
11. A system of reducing bandwidth consumption for data being sent between a client and server over a wide area network (WAN) by optimizing deduplication performance, the system comprising:
-
a storage repository at a first data site storing data chunks and a data structure includes data chunk identifiers of the stored data chunks; and a data deduplication node at the first data site that;
(i) partitions a data message, to be sent from a client at the first data site to a sever at a second data site over a WAN, into a plurality of data chunks, (ii) generates a data chunk identifier (ID) for each of said chunks, determines whether said chunks are stored at the second data site, (iii) adds an ID for each data chunk not stored at the second data site to the data structure at the first data site, when at least one of said chunks that are not stored at the second data site, (iv) links in the data structure at the first data site, the data chunk ID for each data chunk in sequence that are not stored at the second data site, in an order corresponding to the sequence in said message when at least two data chunks partitioned from said message are in sequence and are not stored at the second data site, and (v) sends a transformed data message from the first date site to the second data site, wherein said transformed message includes at least one tuple when at least one of said chunks is stored at the second data site, and data chunks among said chunks that are not stored at the second data site, wherein said tuple;
is a paired representation of data chunk sequence, is to be used to reconstruct said message, and includes a first data chunk ID and a sequence count that represents an aggregate number of linked data chunk IDs. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer program product of reducing bandwidth consumption for data being sent between a client and server over a wide area network (WAN) by optimizing deduplication performance, the computer program product comprising a computer readable storage medium having computer readable program code embodied therewith, wherein said program code upon being processed on a computer causes the computer to:
-
partition a data message into a plurality of data chunks, said message to be sent from a client at a first data site to a server at a second data site over a WAN; generate a data chunk identifier (ID) for each of said chunks; determine whether said chunks are stored at the second data site; add said ID for each of said chunks not stored at the second data site to a data structure at the first data site, when at least one of said chunks is not stored at the second data site; and link in the data structure at the first data site, the data chunk ID for each data chunk in sequence that are not stored at the second data site, in an order corresponding to the sequence in said message when at least two data chunks partitioned from said message are in sequence and are not stored at the second data site send a transformed data message from the first date site to the second data site, wherein said transformed message includes at least one tuple when at least one of said chunks is stored at the second data site and data chunks among said chunks that are not stored at the second data site, wherein said tuple;
is a paired representation of data chunk sequence, is to be used to reconstruct said message, and includes a first data chunk ID and a sequence count that represents an aggregate number of linked data chunk IDs. - View Dependent Claims (20, 21)
-
Specification