Replication of deduplicated data

US 8,356,017 B2
Filed: 08/11/2009
Issued: 01/15/2013
Est. Priority Date: 08/11/2009
Status: Active Grant

First Claim

Patent Images

1. A method for replicating deduplicated data using a processor device, comprising:

creating a plurality of blocks of the deduplicated data in a source repository;

assigning each block of the deduplicated data a global block identification (ID) to generate a plurality of global block IDs, each global block ID identifies data contents in its respective block, is not dependent on the data contents in a probabilistic manner, and is unique without ever being recycled;

transmitting the plurality of global block IDs from the source repository to a target repository;

determining if each global block ID is associated with an existing block of the deduplicated data located within the target repository;

partitioning the plurality of global block IDs into a first portion of global block IDs previously existing within the target repository and a second portion of global block IDs previously non-existing within the target repository based on the determination;

transmitting, by the target repository, the first portion of global block IDs back to the source repository; and

transmitting, by the source repository, data from each block associated with the first portion of global block IDs to the target repository in response to receipt of the first portion of the global block IDs.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Various embodiments for replicating deduplicated data using a processor device are provided. A block of the deduplicated data, created in a source repository, is assigned a global block identification (id) unique in a grid set inclusive of the source repository. The global block id is generated using at least one unique identification value of the block, a containing grid of the grid set, and the source repository. The global block id is transmitted from the source repository to a target repository. If the target repository determines the global block id is associated with an existing block of the deduplicated data located within the target repository, the block is not received by the target repository during a subsequent replication process.

Citations

44 Claims

1. A method for replicating deduplicated data using a processor device, comprising:
- creating a plurality of blocks of the deduplicated data in a source repository;
  
  assigning each block of the deduplicated data a global block identification (ID) to generate a plurality of global block IDs, each global block ID identifies data contents in its respective block, is not dependent on the data contents in a probabilistic manner, and is unique without ever being recycled;
  
  transmitting the plurality of global block IDs from the source repository to a target repository;
  
  determining if each global block ID is associated with an existing block of the deduplicated data located within the target repository;
  
  partitioning the plurality of global block IDs into a first portion of global block IDs previously existing within the target repository and a second portion of global block IDs previously non-existing within the target repository based on the determination;
  
  transmitting, by the target repository, the first portion of global block IDs back to the source repository; and
  
  transmitting, by the source repository, data from each block associated with the first portion of global block IDs to the target repository in response to receipt of the first portion of the global block IDs.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, further including, subsequent to transmitting the first portion of global block IDs back to the source repository, transmitting, for each block, a representation enabling data deduplication from the source repository to the target repository, wherein the representation is inserted into a deduplication index within the target repository.
  - 3. The method of claim 1, wherein the assigning each global ID further includes performing at least one of:
    - assigning a grid ID, unique in the containing grid of the grid set, wherein the grid ID is computed by performing one of selecting a previously nonexistent grid ID and manually assigning the grid ID,upon assignment of the source repository to the containing grid of the grid set, assigning the source repository a repository ID, unique in the containing grid, wherein the repository ID is computed by performing one of selecting a previously nonexistent repository ID and manually assigning the repository ID, andassigning a block ID, unique the containing grid of the grid set, wherein the block ID is computed by selecting a previously nonexistent block ID.
  - 4. The method of claim 3, wherein at least one of:
    - the selecting the previously nonexistent grid ID includes incrementing a grid ID variable corresponding to the grid set,the selecting the previously nonexistent repository ID includes incrementing a repository ID variable corresponding to the containing grid, andthe selecting the previously nonexistent block ID includes incrementing a block ID variable corresponding to the source repository.
  - 5. The method of claim 4, further including generating each global block ID by combining the grid ID, the repository ID, and the block ID corresponding to each respective block.
  - 6. The method of claim 1, further including:
    - receiving incoming data within the source repository; and
      
      deduplicating the incoming data with existing data in the source repository, partitioning the incoming data into a plurality of existing blocks and a plurality of new blocks of the deduplicated data, wherein the block is one of the plurality of the new blocks, each of the plurality of existing blocks having a reference count incremented to reflect the receipt of the incoming data,in addition to the assigning the block the global block ID, assigning a plurality of additional global block IDs to each of a remaining plurality of new blocks, wherein each of the block, the global block ID, the plurality of the new blocks, and the plurality of the additional global block IDs are stored within the source repository, andrecording a mapping of sections of the incoming data to the plurality of the existing blocks and the plurality of the new blocks in the source repository.
  - 7. The method of claim 1, further including, pursuant to the transmitting the plurality of global block IDs, recording, by the source repository, the each unique identification value within a global block ID index.
  - 8. The method of claim 1, further including by the target repository, pursuant to the transmitting the plurality of global block IDs, searching for an existence of each global block ID using the deduplication index, wherein if a particular global block ID is determined to be nonexistent within the target repository:
    - transmitting the particular global block ID from the target repository to the source repository, andtransmitting data associated with the particular block from the source repository to the target repository.
  - 9. The method of claim 8, further including by the target repository, subsequent to the transmitting the data associated with the particular block, storing the data associated with the particular block within the target repository, and incrementing a reference count associated with the particular block.
  - 10. The method of claim 8, further including, if the particular global block ID is determined to be existent within the target repository, incrementing a reference count associated with the particular block.
  - 11. The method of claim 8, further including by the target repository, subsequent to the transmitting the particular global block ID, performing at least one of:
    - recording the particular global block ID within a global block ID index, andrecording a mapping of sections of replicated data to the particular block and a plurality of additional blocks of the deduplicated data.
  - 12. The method of claim 11, wherein the target repository records the mapping of the sections, the target repository further performing at least one of:
    - retrieving the mapping, andloading at least one of the particular block and the plurality of the additional blocks into a memory location.

13. A system for replicating deduplicated data, comprising:
- at least one processor device operable in a computing storage environment, the at least one processor device in communication with a source repository and a target repository, wherein the at least one processor device is adapted for;
  
  creating a plurality of blocks of the deduplicated data in a source repository;
  
  assigning each block of the deduplicated data a global block identification (ID) to generate a plurality of global block IDs, each global block ID identifies data contents in its respective block, is not dependent on the data contents in a probabilistic manner, and is unique without ever being recycled,transmitting the plurality of global block IDs from the source repository to the target repository,determining if each global block ID is associated with an existing block of the deduplicated data located within the target repository,partitioning the plurality of global block IDs into a first portion of global block IDs previously existing within the target repository and a second portion of global block IDs previously non-existing within the target repository based on the determination,transmitting the first portion of global block IDs back to the source repository, andtransmitting data from each block associated with the first portion of global block IDs to the target repository in response to receipt of the first portion of the global block IDs.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 14. The system of claim 13, wherein the at least one processor device is further adapted for, subsequent to the transmitting each global block ID, transmitting, for each block, a representation enabling data deduplication from the source repository to the target repository, wherein the representation is inserted into a deduplication index within the target repository.
  - 15. The system of claim 13, wherein the at least one processor device is further adapted for, pursuant to the assigning each global ID, performing at least one of:
    - assigning a grid ID, unique in the containing grid of the grid set, wherein the grid ID is computed by performing one of selecting a previously nonexistent grid ID and manually assigning the grid ID,upon assignment of the source repository to the containing grid, assigning the source repository a repository ID, unique in the containing grid of the grid set, wherein the repository ID is computed by performing one of selecting a previously nonexistent repository ID and manually assigning the repository ID, andassigning a block ID, unique in the containing grid of the grid set, wherein the block ID is computed by selecting a previously nonexistent block ID.
  - 16. The system of claim 15, wherein at least one of:
    - the selecting the previously nonexistent grid ID includes incrementing a grid ID variable corresponding to the grid set,the selecting the previously nonexistent repository ID includes incrementing a repository ID variable corresponding to the containing grid, andthe selecting the previously nonexistent block ID includes incrementing a block ID variable corresponding to the source repository.
  - 17. The system of claim 16, wherein the at least one processor device is further adapted for generating each global block ID by combining the grid ID, the repository ID, and the block ID corresponding to each respective block.
  - 18. The system of claim 14, wherein the at least one processor device is further adapted for, upon receipt of incoming data within the source repository:
    - deduplicating the incoming data with existing data in the source repository, partitioning the incoming data into a plurality of existing blocks and a plurality of new blocks of the deduplicated data, wherein the block is one of the plurality of the new blocks, each of the plurality of existing blocks having a reference count incremented to reflect the receipt of the incoming data,in addition to the assigning the block the global block ID, assigning a plurality of additional global block ID s to each of a remaining plurality of new blocks, wherein each of the block, the global block ID, the plurality of the new blocks, and the plurality of the additional global block ID s are stored within the source repository, andrecording a mapping of sections of the incoming data to the plurality of the existing blocks and the plurality of the new blocks in the source repository.
  - 19. The system of claim 14, wherein the source repository and the at least one processor device are further adapted for, pursuant to the transmitting of the first portion of the global block IDs, recording each unique identification value within a global block ID index.
  - 20. The system of claim 14, wherein the target repository and the at least one processor device are further adapted for, pursuant to the transmitting the first portion of the global block IDs, searching for an existence of each global block ID using the deduplication index, wherein if a particular global block ID is determined to be nonexistent within the target repository:
    - transmitting the particular global block ID from the target repository to the source repository, andtransmitting data associated with the particular block from the source repository to the target repository.
  - 21. The system of claim 20, wherein the target repository and the at least one processor device are further adapted for, subsequent to the transmitting the data associated with the particular block, storing the data associated with the particular block within the target repository, and incrementing a reference count associated with the particular block.
  - 22. The system of claim 20, wherein the target repository and the at least one processor device are further adapted for, if the particular global block ID is determined to be existent within the target repository, incrementing a reference count associated with the particular block.
  - 23. The system of claim 20, wherein the target repository and the at least one processor device are further adapted for, subsequent to the transmitting the particular global block ID, performing at least one of:
    - recording the particular global block ID within a global block ID index, andrecording a mapping of sections of replicated data to the particular block and a plurality of additional blocks of the deduplicated data.
  - 24. The system of claim 23, wherein the target repository records the mapping of the sections and wherein the target repository and the at least one processor device are further adapted for, performing at least one of:
    - retrieving the mapping, andloading at least one of the particular block and the plurality of the additional blocks into a memory location.

25. A computer program product for replicating deduplicated data using a processor device, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
- a first executable portion for creating a plurality of blocks of the deduplicated data in a source repository;
  
  a second executable portion for assigning each block of the deduplicated data a global block identification (ID) to generate a plurality of global block IDs, each global block ID identifies data contents in its respective block, is not dependent on the data contents in a probabilistic manner, and is unique without ever being recycled;
  
  a third executable portion for transmitting the plurality of global block IDs from the source repository to a target repository;
  
  a fourth executable portion for determining if each global block ID is associated with an existing block of the deduplicated data located within the target repository;
  
  a fifth executable portion for partitioning the plurality of global block IDs into a first portion of global block IDs previously existing within the target repository and a second portion of global block IDs previously non-existing within the target repository based on the determination;
  
  a sixth executable portion for transmitting, by the target repository, the first portion of global block IDs back to the source repository; and
  
  a seventh executable portion for transmitting, by the source repository, data from each block associated with the first portion of global block IDs to the target repository in response to receipt of the first portion of the global block IDs.
- View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
- - 26. The computer program product of claim 25, further including an eighth executable portion for, subsequent to the transmitting the first portion of the global block IDs, transmitting, for each block, a representation enabling data deduplication from the source repository to the target repository, wherein the representation is inserted into a deduplication index within the target repository.
  - 27. The computer program product of claim 25, further including an eighth executable portion for, pursuant to the assigning each global ID, performing at least one of:
    - assigning a grid ID, unique in the containing grid of the grid set, wherein the grid ID is computed by performing one of selecting a previously nonexistent grid ID and manually assigning the grid ID,upon assignment of the source repository to the containing grid, assigning the source repository a repository ID, unique in the containing grid, wherein the repository ID is computed by performing one of selecting a previously nonexistent repository ID and manually assigning the repository ID, andassigning a block ID, unique in the source repository, wherein the block is computed by selecting a previously nonexistent block ID.
  - 28. The computer program product of claim 27, wherein at least one of:
    - the selecting the previously nonexistent grid ID includes incrementing a grid ID variable corresponding to the grid set,the selecting the previously nonexistent repository ID includes incrementing a repository ID variable corresponding to the containing grid, andthe selecting the previously nonexistent block ID includes incrementing a block ID variable corresponding to the source repository.
  - 29. The computer program product of claim 28 further including a ninth executable portion for generating each global block ID by combining the grid ID, the repository ID, and the block ID corresponding to each respective block.
  - 30. The computer program product of claim 26, further including an eighth executable portion for, upon receipt of incoming data within the source repository:
    - deduplicating the incoming data with existing data in the source repository, partitioning the incoming data into a plurality of existing blocks and a plurality of new blocks of the deduplicated data, wherein the block is one of the plurality of the new blocks, each of the plurality of existing blocks having a reference count incremented to reflect the receipt of the incoming data,in addition to the assigning the block the global block ID, assigning a plurality of additional global block IDs to each of a remaining plurality of new blocks, wherein each of the block, the global block ID, the plurality of the new blocks, and the plurality of the additional global block IDs are stored within the source repository, andrecording a mapping of sections of the incoming data to the plurality of the existing blocks and the plurality of the new blocks in the source repository.
  - 31. The computer program product of claim 26, further including an eighth executable portion for, pursuant to the transmitting the first portion of the global block IDs, recording, by the source repository, the at least one unique identification value within a global block ID index.
  - 32. The computer program product of claim 26, further including an eighth executable portion for, by the target repository, pursuant to the transmitting the first portion of the global block IDs, searching for an existence of a particular global block ID using the deduplication index, wherein if the particular global block ID is determined to be nonexistent within the target repository:
    - transmitting the particular global block ID from the target repository to the source repository, andtransmitting data associated with the particular block from the source repository to the target repository.
  - 33. The computer program product of claim 32, further including a ninth executable portion for, by the target repository, subsequent to the transmitting the data associated with the particular block, storing the data associated with the particular block within the target repository, and incrementing a reference count associated with the particular block.
  - 34. The computer program product of claim 32, further including a ninth executable portion for, if the particular global block ID is determined to be existent within the target repository, incrementing a reference count associated with the particular block.
  - 35. The computer program product of claim 32, further including a ninth executable portion for, by the target repository, subsequent to the transmitting the first portion of the global block IDs, performing at least one of:
    - recording each global block ID within a global block ID index, andrecording a mapping of sections of replicated data to the particular block and a plurality of additional blocks of the deduplicated data.
  - 36. The computer program product of claim 35, wherein the target repository records the mapping of the sections, the computer program product further including a tenth executable portion for, by the target repository, performing at least one of:
    - retrieving the mapping, andloading at least one of the particular block and the plurality of the additional blocks into a memory location.

37. A device, comprising:
- a circuit customized for performing steps of a method for replicating deduplicated data, the steps including;
  
  creating a plurality of blocks of the deduplicated data in a source repository,assigning each block of the deduplicated data a global block identification (ID) to generate a plurality of global block IDs, each global block ID identifies data contents in its respective block, is not dependent on the data contents in a probabilistic manner, and is unique without ever being recycled,transmitting the plurality of global block IDs from the source repository to a target repository,determining if each global block ID is associated with an existing block of the deduplicated data located within the target repository,partitioning the plurality of global block IDs into a first portion of global block IDs previously existing within the target repository and a second portion of global block IDs previously non-existing within the target repository based on the determination,transmitting the first portion of the global block IDs back to the source repository, andtransmitting data from each block associated with the first portion of global block IDs to the target repository in response to receipt of the first portion of the global block IDs.
- View Dependent Claims (38, 39)
- - 38. The device of claim 37, wherein the steps further include, subsequent to the transmitting the first portion of the global block IDs, transmitting, for each block, a representation enabling data deduplication from the source repository to the target repository, further wherein the representation is inserted into a deduplication index within the target repository.
  - 39. The device of claim 37, wherein the circuit includes one of an application-specific integrated circuit (ASIC), system-on-chip (SoC), and a field programmable gate array (FPGA).

40. A method for replicating deduplicated data using a processor device, comprising:
- performing each of;
  
  assigning a grid identification (ID) to each of a plurality of grids comprising a grid set, the grid ID unique to each grid within the grid set, wherein the grid ID is computed by performing one of selecting a previously nonexistent grid ID and manually assigning the grid ID,upon assignment of a source repository to each grid in the grid set, assigning each source repository a repository ID, the repository ID unique in the grid set, wherein the repository ID is computed by performing one of selecting a previously nonexistent repository ID and manually assigning the repository ID, andassigning a block ID to each block of data, the block ID unique in the grid set, wherein;
  
  each block ID identifies data contents in its respective block and is not dependent on the data contents in a probabilistic manner,the grid ID, the repository ID, and the block ID form a global block ID, andeach global block ID is never recycled; and
  
  storing each of the grid ID, the repository ID, and the block ID as metadata in an identification file.
- View Dependent Claims (41, 42, 43, 44)
- - 41. The method of claim 40, further including storing at least one internet protocol (IP) address associated with at least one of the grid and the source repository as the metadata along with the grid ID, the repository ID, and the block ID in the identification file.
  - 42. The method of claim 41, further including generating a global block identification (ID), unique in the source repository and a target repository, by combining the grid ID, the repository ID, and the block ID.
  - 43. The method of claim 42, further including:
    - transmitting the global block ID from the source repository to the target repository, wherein if the target repository determines the global block ID is associated with an existing block of the deduplicated data located within the target repository, the global block ID is transmitted back to the source repository.
  - 44. The method of claim 43, further including, subsequent to the transmitting the global block ID, transmitting, for a block associated with the block ID, a representation enabling data deduplication from the source repository to the target repository, wherein the representation is inserted into a deduplication index within the target repository.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Akirav, Shay H., Asher, Ron, Bachar, Yariv, Aronovich, Lior, Ish-Shalom, Ariel J., Leneman, Ofer
Primary Examiner(s)
LIAO, JASON G

Application Number

US12/539,109
Publication Number

US 20110040728A1
Time in Patent Office

1,253 Days
Field of Search

None
US Class Current

707/692
CPC Class Codes

G06F 11/2094   Redundant storage or storag...

G06F 16/275   Synchronous replication

G06F 3/0641   De-duplication techniques

G06F 3/065   Replication mechanisms

Replication of deduplicated data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

44 Claims

Specification

Solutions

Use Cases

Quick Links

Replication of deduplicated data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

44 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links