Replication of deduplicated data
First Claim
1. A method for replicating deduplicated data using a processor device, comprising:
- creating a plurality of blocks of the deduplicated data in a source repository;
assigning each block of the deduplicated data a global block identification (ID) to generate a plurality of global block IDs, each global block ID identifies data contents in its respective block, is not dependent on the data contents in a probabilistic manner, and is unique without ever being recycled;
transmitting the plurality of global block IDs from the source repository to a target repository;
determining if each global block ID is associated with an existing block of the deduplicated data located within the target repository;
partitioning the plurality of global block IDs into a first portion of global block IDs previously existing within the target repository and a second portion of global block IDs previously non-existing within the target repository based on the determination;
transmitting, by the target repository, the first portion of global block IDs back to the source repository; and
transmitting, by the source repository, data from each block associated with the first portion of global block IDs to the target repository in response to receipt of the first portion of the global block IDs.
1 Assignment
0 Petitions
Accused Products
Abstract
Various embodiments for replicating deduplicated data using a processor device are provided. A block of the deduplicated data, created in a source repository, is assigned a global block identification (id) unique in a grid set inclusive of the source repository. The global block id is generated using at least one unique identification value of the block, a containing grid of the grid set, and the source repository. The global block id is transmitted from the source repository to a target repository. If the target repository determines the global block id is associated with an existing block of the deduplicated data located within the target repository, the block is not received by the target repository during a subsequent replication process.
-
Citations
44 Claims
-
1. A method for replicating deduplicated data using a processor device, comprising:
-
creating a plurality of blocks of the deduplicated data in a source repository; assigning each block of the deduplicated data a global block identification (ID) to generate a plurality of global block IDs, each global block ID identifies data contents in its respective block, is not dependent on the data contents in a probabilistic manner, and is unique without ever being recycled; transmitting the plurality of global block IDs from the source repository to a target repository; determining if each global block ID is associated with an existing block of the deduplicated data located within the target repository; partitioning the plurality of global block IDs into a first portion of global block IDs previously existing within the target repository and a second portion of global block IDs previously non-existing within the target repository based on the determination; transmitting, by the target repository, the first portion of global block IDs back to the source repository; and transmitting, by the source repository, data from each block associated with the first portion of global block IDs to the target repository in response to receipt of the first portion of the global block IDs. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for replicating deduplicated data, comprising:
at least one processor device operable in a computing storage environment, the at least one processor device in communication with a source repository and a target repository, wherein the at least one processor device is adapted for; creating a plurality of blocks of the deduplicated data in a source repository; assigning each block of the deduplicated data a global block identification (ID) to generate a plurality of global block IDs, each global block ID identifies data contents in its respective block, is not dependent on the data contents in a probabilistic manner, and is unique without ever being recycled, transmitting the plurality of global block IDs from the source repository to the target repository, determining if each global block ID is associated with an existing block of the deduplicated data located within the target repository, partitioning the plurality of global block IDs into a first portion of global block IDs previously existing within the target repository and a second portion of global block IDs previously non-existing within the target repository based on the determination, transmitting the first portion of global block IDs back to the source repository, and transmitting data from each block associated with the first portion of global block IDs to the target repository in response to receipt of the first portion of the global block IDs. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
25. A computer program product for replicating deduplicated data using a processor device, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
-
a first executable portion for creating a plurality of blocks of the deduplicated data in a source repository; a second executable portion for assigning each block of the deduplicated data a global block identification (ID) to generate a plurality of global block IDs, each global block ID identifies data contents in its respective block, is not dependent on the data contents in a probabilistic manner, and is unique without ever being recycled; a third executable portion for transmitting the plurality of global block IDs from the source repository to a target repository; a fourth executable portion for determining if each global block ID is associated with an existing block of the deduplicated data located within the target repository; a fifth executable portion for partitioning the plurality of global block IDs into a first portion of global block IDs previously existing within the target repository and a second portion of global block IDs previously non-existing within the target repository based on the determination; a sixth executable portion for transmitting, by the target repository, the first portion of global block IDs back to the source repository; and a seventh executable portion for transmitting, by the source repository, data from each block associated with the first portion of global block IDs to the target repository in response to receipt of the first portion of the global block IDs. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. A device, comprising:
-
a circuit customized for performing steps of a method for replicating deduplicated data, the steps including; creating a plurality of blocks of the deduplicated data in a source repository, assigning each block of the deduplicated data a global block identification (ID) to generate a plurality of global block IDs, each global block ID identifies data contents in its respective block, is not dependent on the data contents in a probabilistic manner, and is unique without ever being recycled, transmitting the plurality of global block IDs from the source repository to a target repository, determining if each global block ID is associated with an existing block of the deduplicated data located within the target repository, partitioning the plurality of global block IDs into a first portion of global block IDs previously existing within the target repository and a second portion of global block IDs previously non-existing within the target repository based on the determination, transmitting the first portion of the global block IDs back to the source repository, and transmitting data from each block associated with the first portion of global block IDs to the target repository in response to receipt of the first portion of the global block IDs. - View Dependent Claims (38, 39)
-
-
40. A method for replicating deduplicated data using a processor device, comprising:
performing each of; assigning a grid identification (ID) to each of a plurality of grids comprising a grid set, the grid ID unique to each grid within the grid set, wherein the grid ID is computed by performing one of selecting a previously nonexistent grid ID and manually assigning the grid ID, upon assignment of a source repository to each grid in the grid set, assigning each source repository a repository ID, the repository ID unique in the grid set, wherein the repository ID is computed by performing one of selecting a previously nonexistent repository ID and manually assigning the repository ID, and assigning a block ID to each block of data, the block ID unique in the grid set, wherein; each block ID identifies data contents in its respective block and is not dependent on the data contents in a probabilistic manner, the grid ID, the repository ID, and the block ID form a global block ID, and each global block ID is never recycled; and storing each of the grid ID, the repository ID, and the block ID as metadata in an identification file. - View Dependent Claims (41, 42, 43, 44)
Specification