Parallelizing and deduplicating backup data
First Claim
1. A computer-implemented method for deduplicating data, comprising:
- selecting a grid server in a plurality of grid servers for deduplicating a segment of data in a plurality of segments of data contained within a data stream, wherein a data deduplication system communicatively coupled to the plurality of servers is configured to split the data stream into the plurality of segments and select the grid server for deduplicating the segment of data;
forwarding the segment of data to the selected grid server for deduplication; and
deduplicating, using the plurality of grid servers, a zone contained within the forwarded segment of data using a listing of a plurality of zone stamps, each zone stamp in the listing of the plurality of zone stamps representing a zone in a plurality of zones previously deduplicated by at least one server in the plurality of grid servers, the deduplicating includingdetermining, using the listing of the plurality of zone stamps, by a first grid server in the plurality of grid servers that a second grid server in the plurality of grid servers previously deduplicated a first zone in the plurality of zones having a first zone stamp matching to a second zone stamp of a second zone being processed by the first grid server, andtransmitting, by the first grid server, the second zone to the second grid server for deduplication.
6 Assignments
0 Petitions
Accused Products
Abstract
A method, a system, and a computer program product for performing a backup of data are disclosed. A grid server in a plurality of grid servers is selected for deduplicating a segment of data in a plurality of segments of data contained within a data stream. The segment of data is forwarded to the selected grid server for deduplication. A zone contained within the forwarded segment of data is deduplicated using the selected server. The deduplication is performed based on a listing of a plurality of zone stamps. Each zone stamp in the plurality of zone stamps represents a zone in a plurality of zones deduplicated by at least one server in the plurality of grid servers.
-
Citations
27 Claims
-
1. A computer-implemented method for deduplicating data, comprising:
-
selecting a grid server in a plurality of grid servers for deduplicating a segment of data in a plurality of segments of data contained within a data stream, wherein a data deduplication system communicatively coupled to the plurality of servers is configured to split the data stream into the plurality of segments and select the grid server for deduplicating the segment of data; forwarding the segment of data to the selected grid server for deduplication; and deduplicating, using the plurality of grid servers, a zone contained within the forwarded segment of data using a listing of a plurality of zone stamps, each zone stamp in the listing of the plurality of zone stamps representing a zone in a plurality of zones previously deduplicated by at least one server in the plurality of grid servers, the deduplicating including determining, using the listing of the plurality of zone stamps, by a first grid server in the plurality of grid servers that a second grid server in the plurality of grid servers previously deduplicated a first zone in the plurality of zones having a first zone stamp matching to a second zone stamp of a second zone being processed by the first grid server, and transmitting, by the first grid server, the second zone to the second grid server for deduplication. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
at least one programmable processor; and a non-transitory machine-readable medium storing instructions that, when executed by the at least one programmable processor, cause the at least one programmable processor to perform operations comprising; selecting a grid server in a plurality of grid servers for deduplicating a segment of data in a plurality of segments of data contained within a data stream, wherein a data deduplication system communicatively coupled to the plurality of servers is configured to split the data stream into the plurality of segments and select the grid server for deduplicating the segment of data; forwarding the segment of data to the selected grid server for deduplication; and deduplicating, using the plurality of grid servers, a zone contained within the forwarded segment of data using a listing of a plurality of zone stamps, each zone stamp in the listing of the plurality of zone stamps representing a zone in a plurality of zones previously deduplicated by at least one server in the plurality of grid servers, the deduplicating including determining, using the listing of the plurality of zone stamps, by a first grid server in the plurality of grid servers that a second grid server in the plurality of grid servers previously deduplicated a first zone in the plurality of zones having a first zone stamp matching to a second zone stamp of a second zone being processed by the first grid server, and transmitting, by the first grid server, the second zone to the second grid server for deduplication. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer program product comprising a non-transitory machine-readable medium storing instructions that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising:
-
selecting a grid server in a plurality of grid servers for deduplicating a segment of data in a plurality of segments of data contained within a data stream, wherein a data deduplication system communicatively coupled to the plurality of servers is configured to split the data stream into the plurality of segments and select the grid server for deduplicating the segment of data; forwarding the segment of data to the selected grid server for deduplication; and deduplicating, using the plurality of grid servers, a zone contained within the forwarded segment of data using a listing of a plurality of zone stamps, each zone stamp in the listing of the plurality of zone stamps representing a zone in a plurality of zones previously deduplicated by at least one server in the plurality of grid servers, the deduplicating including determining, using the listing of the plurality of zone stamps, by a first grid server in the plurality of grid servers that a second grid server in the plurality of grid servers previously deduplicated a first zone in the plurality of zones having a first zone stamp matching to a second zone stamp of a second zone being processed by the first grid server, and transmitting, by the first grid server, the second zone to the second grid server for deduplication. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
Specification