Higher efficiency storage replication using compression
First Claim
1. A multi-cluster synchronization system, comprising:
- an intercluster network coupling a first cluster and a remote cluster, the first cluster including a first cluster-internal network, a first structured information repository, and a first object storage, wherein the first structured information repository contains metadata corresponding to stored information objects in the first object storage, and wherein the first structured information repository and the first object storage are coupled via the cluster-internal network;
a network evaluator that determines a state of one or more networks coupled to the first cluster and the remote cluster; and
an intercluster repository synchronizer including a compression module that identifies one or more information objects to compress from the first object storage and transmit to the remote cluster in compressed form, wherein the compression module determines a target size of a single compressed information object, determines, based on the state of the one or more networks, whether to increase or decrease the target size of the single compressed information object, and updates the target size in accordance with the determining whether to increase or decrease the target size,wherein the compression module compresses a first information object and a second information object stored in the first object storage and combines the first and second compressed information objects, wherein a size of each of the first compressed information object and second compressed information object is smaller than the updated target size, and the single information object includes the first and second compressed information objects, andwherein the compression module transmits the single information object to the remote cluster for storage, wherein transmitting the single information object results in the duplication of the first and second information objects at the remote cluster.
4 Assignments
0 Petitions
Accused Products
Abstract
In one embodiment, there is a multi-cluster synchronization system between two or more clusters. The multi-cluster synchronization system uses variable compression to optimize the transfer of information between the clusters. Compression is used not only to minimize the total number of bytes sent between the two clusters, but to dynamically vary the size of the objects sent across the wire to optimize for higher throughput after considering packet loss, TCP windows, and block sizes. This includes both the packaging of multiple small files together into one larger compressed file, saving on TCP and header overhead, but also the chunking of large files into multiple smaller files that are less likely to have difficulties due to intermittent network congestion or errors. A further embodiment uses forward error correction to maximize the chances that the remote end will be able to correctly reconstitute the transmission.
-
Citations
20 Claims
-
1. A multi-cluster synchronization system, comprising:
-
an intercluster network coupling a first cluster and a remote cluster, the first cluster including a first cluster-internal network, a first structured information repository, and a first object storage, wherein the first structured information repository contains metadata corresponding to stored information objects in the first object storage, and wherein the first structured information repository and the first object storage are coupled via the cluster-internal network; a network evaluator that determines a state of one or more networks coupled to the first cluster and the remote cluster; and an intercluster repository synchronizer including a compression module that identifies one or more information objects to compress from the first object storage and transmit to the remote cluster in compressed form, wherein the compression module determines a target size of a single compressed information object, determines, based on the state of the one or more networks, whether to increase or decrease the target size of the single compressed information object, and updates the target size in accordance with the determining whether to increase or decrease the target size, wherein the compression module compresses a first information object and a second information object stored in the first object storage and combines the first and second compressed information objects, wherein a size of each of the first compressed information object and second compressed information object is smaller than the updated target size, and the single information object includes the first and second compressed information objects, and wherein the compression module transmits the single information object to the remote cluster for storage, wherein transmitting the single information object results in the duplication of the first and second information objects at the remote cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of synchronizing objects in an object storage system, the method comprising:
-
identifying a set of stored information objects stored in a first cluster to be transferred to a second cluster for storage, the first cluster and second cluster being coupled to an intercluster network; determining a state of one or more networks coupled to the first and second clusters; determining a target size of a single information object for transfer to the second cluster; determining, based on the state of the one or more networks, whether to increase or decrease the target size; updating the target size in accordance with the determining whether to increase or decrease the target size; compressing a first information object stored in the first object storage, wherein a size of the first compressed information object is smaller than the updated target size; compressing a second information object stored in the first object storage, wherein a size combining the first and second compressed information objects, wherein the single information object includes the first and second compressed information objects; and transmitting the single information object to the second cluster for storage, wherein the transmitting the single information object results in the duplication of the first and second information objects at the second cluster. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A non-transitory computer-readable medium comprising a plurality of machine-readable instructions that when executed by one or more processors causes the one or more processors to perform a method comprising:
-
identifying a set of stored information objects stored in a first cluster to be transferred to a second cluster for storage, the first cluster and second cluster being coupled to an intercluster network; determining a state of one or more networks coupled to the first and second clusters; determining a target size of a single information object for transfer to the second cluster; determining, based on the state of the one or more networks, whether to increase or decrease the target size; updating the target size in accordance with the determining whether to increase or decrease the target size; compressing a first information object stored in the first object storage, wherein a size of the first compressed information object is smaller than the updated target size; compressing a second information object stored in the first object storage, wherein a size of the second compressed information object is smaller than the updated target size; combining the first and second compressed information objects, wherein the single information object includes the first and second compressed information objects; and transmitting the single information object to the second cluster for storage, wherein the transmitting the single information object results in the duplication of the first and second information objects at the second cluster.
-
Specification