Scalable grid deduplication
First Claim
1. A computer implemented method, comprisinggenerating a listing of a plurality of zone stamps, each zone stamp representing a data zone in the plurality of data zones in a data stream, the data stream being received from at least one data source by a network of communicatively coupled plurality of grid servers, each grid server in the plurality of grid servers storing a grid server listing of zone stamps corresponding to zones of data stored on that grid server, the generated listing including a logical arrangement of a combination of grid server listings of zone stamps obtained from each grid server in the plurality of grid servers and being accessible by the plurality of grid servers, and storing the generated listing on a coordinating grid server in the plurality of grid servers;
- partitioning, using the coordinating grid server, the generated listing into a plurality of partitions of zone stamps, each partition in the plurality of partitions including a portion of the plurality of zone stamps, the partitioning being performed based on at least one of the following;
a processing capability of each grid server in the plurality of grid servers, a size of each zone in the plurality of zones stored by the plurality of grid servers, a time consumed by comparing of zone stamps in the plurality of zone stamps contained in the generated listing, availability to process data zones in the data stream of each grid server in the plurality of grid servers, and any combination thereof; and
distributing, using the coordinating grid server, each partition of zone stamps in the plurality of partitions to one or more grid servers in the plurality of grid servers for storage, the distributing being performed based on at least a processing capability of each grid server in the plurality of grid servers;
selecting, using the coordinating grid server, a grid server in the plurality of grid servers, based on the generated listing and a partition stored on that grid server, to performcomparing a first zone stamp in the plurality of zone stamps contained in the generated listing to a second zone stamp in the plurality of zone stamps contained in the generated listing, the first zone stamp representing a first zone in the plurality of zones and the second zone stamp representing a second zone in the plurality of zones in the received data stream; and
delta-compressing the first zone and the second zone based on a determination that the first zone stamp is substantially similar to the second zone stamp; and
monitoring, using the coordinating grid server, the comparing and the delta-compressing, and, based on the monitoring, selecting, using the coordinating grid server, at least another grid server in the plurality of grid servers to perform the comparing and the delta-compressing upon determination that the selected grid server exceeded a predetermined amount of time to perform the comparing and the delta-compressing.
6 Assignments
0 Petitions
Accused Products
Abstract
A system, a method, and a computer program product for performing deduplication of data using a scalable deduplication grid are disclosed. A listing of a plurality of zone stamps is generated, where each zone stamp represents a zone in the plurality of zones in a data stream. The listing contains a logical arrangement of the plurality of zone stamps obtained from each storage location and being accessible by a plurality of servers. A first zone stamp in the listing is compared to a second zone stamp in the listing. The first and second zones are delta-compressed based on a determination that the first zone stamp is substantially similar to the second zone stamp. A server is selected to perform the comparison and delta-compression.
42 Citations
45 Claims
-
1. A computer implemented method, comprising
generating a listing of a plurality of zone stamps, each zone stamp representing a data zone in the plurality of data zones in a data stream, the data stream being received from at least one data source by a network of communicatively coupled plurality of grid servers, each grid server in the plurality of grid servers storing a grid server listing of zone stamps corresponding to zones of data stored on that grid server, the generated listing including a logical arrangement of a combination of grid server listings of zone stamps obtained from each grid server in the plurality of grid servers and being accessible by the plurality of grid servers, and storing the generated listing on a coordinating grid server in the plurality of grid servers; -
partitioning, using the coordinating grid server, the generated listing into a plurality of partitions of zone stamps, each partition in the plurality of partitions including a portion of the plurality of zone stamps, the partitioning being performed based on at least one of the following;
a processing capability of each grid server in the plurality of grid servers, a size of each zone in the plurality of zones stored by the plurality of grid servers, a time consumed by comparing of zone stamps in the plurality of zone stamps contained in the generated listing, availability to process data zones in the data stream of each grid server in the plurality of grid servers, and any combination thereof; anddistributing, using the coordinating grid server, each partition of zone stamps in the plurality of partitions to one or more grid servers in the plurality of grid servers for storage, the distributing being performed based on at least a processing capability of each grid server in the plurality of grid servers; selecting, using the coordinating grid server, a grid server in the plurality of grid servers, based on the generated listing and a partition stored on that grid server, to perform comparing a first zone stamp in the plurality of zone stamps contained in the generated listing to a second zone stamp in the plurality of zone stamps contained in the generated listing, the first zone stamp representing a first zone in the plurality of zones and the second zone stamp representing a second zone in the plurality of zones in the received data stream; and delta-compressing the first zone and the second zone based on a determination that the first zone stamp is substantially similar to the second zone stamp; and monitoring, using the coordinating grid server, the comparing and the delta-compressing, and, based on the monitoring, selecting, using the coordinating grid server, at least another grid server in the plurality of grid servers to perform the comparing and the delta-compressing upon determination that the selected grid server exceeded a predetermined amount of time to perform the comparing and the delta-compressing. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system comprising:
-
at least one programmable processor; and a non-transitory machine-readable medium storing instructions that, when executed by the at least one programmable processor, cause the at least one programmable processor to perform operations comprising; generating a listing of a plurality of zone stamps, each zone stamp representing a data zone in the plurality of data zones in a data stream, the data stream being received from at least one data source by a network of communicatively coupled plurality of grid servers, each grid server in the plurality of grid servers storing a grid server listing of zone stamps corresponding to zones of data stored on that grid server, the generated listing including a logical arrangement of a combination of grid server listings of zone stamps obtained from each grid server in the plurality of grid servers and being accessible by the plurality of grid servers; partitioning, using the coordinating grid server, the generated listing into a plurality of partitions of zone stamps, each partition in the plurality of partitions including a portion of the plurality of zone stamps, the partitioning being performed based on at least one of the following;
a processing capability of each grid server in the plurality of grid servers, a size of each zone in the plurality of zones stored by the plurality of grid servers, a time consumed by comparing of zone stamps in the plurality of zone stamps contained in the generated listing, availability to process data zones in the data stream of each grid server in the plurality of grid servers, and any combination thereof; anddistributing, using the coordinating grid server, each partition of zone stamps in the plurality of partitions to one or more grid servers in the plurality of grid servers for storage, the distributing being performed based on at least a processing capability of each grid server in the plurality of grid servers; selecting, using the coordinating grid server, a grid server in the plurality of grid servers, based on the generated listing and a partition stored on that grid server, to perform comparing a first zone stamp in the plurality of zone stamps contained in the generated listing to a second zone stamp in the plurality of zone stamps contained in the generated listing, the first zone stamp representing a first zone in the plurality of zones and the second zone stamp representing a second zone in the plurality of zones in the received data stream; and delta-compressing the first zone and the second zone based on a determination that the first zone stamp is substantially similar to the second zone stamp; and monitoring, using the coordinating grid server, the comparing and the delta-compressing, and, based on the monitoring, selecting, using the coordinating grid server, at least another grid server in the plurality of grid servers to perform the comparing and the delta-compressing upon determination that the selected grid server exceeded a predetermined amount of time to perform the comparing and the delta-compressing. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A computer program product comprising a non-transitory machine-readable medium storing instructions that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising:
-
generating a listing of a plurality of zone stamps, each zone stamp representing a data zone in the plurality of data zones in a data stream, the data stream being received from at least one data source by a network of communicatively coupled plurality of grid servers, each grid server in the plurality of grid servers storing a grid server listing of zone stamps corresponding to zones of data stored on that grid server, the generated listing including a logical arrangement of a combination of grid server listings of zone stamps obtained from each grid server in the plurality of grid servers and being accessible by the plurality of grid servers, and storing the generated listing on a coordinating grid server in the plurality of grid servers; partitioning, using the coordinating grid server, the generated listing into a plurality of partitions of zone stamps, each partition in the plurality of partitions including a portion of the plurality of zone stamps, the partitioning being performed based on at least one of the following;
a processing capability of each grid server in the plurality of grid servers, a size of each zone in the plurality of zones stored by the plurality of grid servers, a time consumed by comparing of zone stamps in the plurality of zone stamps contained in the generated listing, availability to process data zones in the data stream of each grid server in the plurality of grid servers, and any combination thereof; anddistributing, using the coordinating grid server, each partition of zone stamps in the plurality of partitions to one or more grid servers in the plurality of grid servers for storage, the distributing being performed based on at least a processing capability of each grid server in the plurality of grid servers; selecting, using the coordinating grid server, a grid server in the plurality of grid servers, based on the generated listing and a partition stored on that grid server, to perform comparing a first zone stamp in the plurality of zone stamps contained in the generated listing to a second zone stamp in the plurality of zone stamps contained in the generated listing, the first zone stamp representing a first zone in the plurality of zones and the second zone stamp representing a second zone in the plurality of zones in the received data stream; and delta-compressing the first zone and the second zone based on a determination that the first zone stamp is substantially similar to the second zone stamp; and monitoring, using the coordinating grid server, the comparing and the delta-compressing, and, based on the monitoring, selecting, using the coordinating grid server, at least another grid server in the plurality of grid servers to perform the comparing and the delta-compressing upon determination that the selected grid server exceeded a predetermined amount of time to perform the comparing and the delta-compressing. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
Specification