Method and apparatus for de-duplication after mirror operation
First Claim
1. A method of operating an information system, comprising:
- storing data to a first volume and mirroring the data to a second volume, said second volume being a virtual volume having a plurality of logical storage addresses, wherein a segment of physical storage capacity is allocated for a specified logical address as required when the data is to be stored to said specified logical address;
selecting segments of the second volume during a de-duplication operation on the second volume;
calculating a hash value for a particular segment representative of data contained in the particular segment;
comparing the calculated hash value with previously-stored hash values for other segments in the second volume;
linking the particular segment to another segment having a previously-stored hash value that matches the calculated hash value of the particular segment;
releasing the particular segment from the second volume when another segment has a previously-stored hash value that matches the calculated hash value of the particular segment, whereby physical storage capacity required for the second volume is reduced;
resynchronizing the data stored on said second volume with the data stored on said first volume;
splitting said second volume from mirroring said first volume prior to said step of selecting segments of the second volume during the de-duplication operation on the second volume; and
reverse synchronizing the data stored on said first volume with data stored on said second volume, wherein before reverse synchronizing whether the second volume has been de-duplicated or not is checked and if the second volume has been de-duplicated, said reverse synchronizing is conducted on to the first volume from the de-duplicated volume.
1 Assignment
0 Petitions
Accused Products
Abstract
An amount of storage capacity used during mirroring operations is reduced by applying de-duplication operations to the mirror volumes. Data stored to a first volume is mirrored to a second volume. The second volume is a virtual volume having a plurality of logical addresses, such that segments of physical storage capacity are allocated for a specified logical address as needed when data is stored to the specified logical address. A de-duplication operation is carried out on the second volume following a split from the first volume. A particular segment of the second volume is identified as having data that is the same as another segment in the second volume or in the same consistency group. A link is created from the particular segment to the other segment and the particular segment is released from the second volume so that physical storage capacity required for the second volume is reduced.
28 Citations
20 Claims
-
1. A method of operating an information system, comprising:
-
storing data to a first volume and mirroring the data to a second volume, said second volume being a virtual volume having a plurality of logical storage addresses, wherein a segment of physical storage capacity is allocated for a specified logical address as required when the data is to be stored to said specified logical address; selecting segments of the second volume during a de-duplication operation on the second volume; calculating a hash value for a particular segment representative of data contained in the particular segment; comparing the calculated hash value with previously-stored hash values for other segments in the second volume; linking the particular segment to another segment having a previously-stored hash value that matches the calculated hash value of the particular segment; releasing the particular segment from the second volume when another segment has a previously-stored hash value that matches the calculated hash value of the particular segment, whereby physical storage capacity required for the second volume is reduced; resynchronizing the data stored on said second volume with the data stored on said first volume; splitting said second volume from mirroring said first volume prior to said step of selecting segments of the second volume during the de-duplication operation on the second volume; and reverse synchronizing the data stored on said first volume with data stored on said second volume, wherein before reverse synchronizing whether the second volume has been de-duplicated or not is checked and if the second volume has been de-duplicated, said reverse synchronizing is conducted on to the first volume from the de-duplicated volume. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An information system, comprising:
-
a first storage system including a first controller and a plurality of storage devices, said first controller configured to present physical storage space on said storage devices as volumes for storage of data; a first host computer able to communicate with said first storage system via a network; and a second storage system in communication with said first storage system, said second storage system including a second controller and a plurality of second disk devices, said second storage system presenting said second volume as said virtual volume and being configured to carry out said de-duplication operation; wherein said first storage system is configured to store write data received from said computer to a first volume on said first storage system and mirror the write data to a second volume, said second volume being a virtual volume having a plurality of logical addresses, wherein segments of physical storage capacity are allocated for a logical address as required when the write data is stored to said logical address; wherein said second volume is de-duplicated by selecting segments of the second volume during a de-duplication operation and a hash value is calculated for a particular segment representative of data contained in the particular segment; wherein said calculated hash value is compared with previously-stored hash values for other segments in the second volume, and the particular segment is linked to another segment having a previously-stored hash value that matches the calculated hash value of the particular segment, wherein the particular segment is released from the second volume when another segment has a previously-stored hash value that matches the calculated hash value of the particular segment, whereby physical storage capacity required for the second volume is reduced; wherein before said selecting segments of the second volume whether the second volume has been designated for de-duplication operations is checked, and wherein said de-duplication operation on the second volume is performed after pair configuration of the second volume to the first volume is suspended and determined that the second volume has been designated for de-duplication operations based on an information maintained in a memory of said information system. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A method of reducing an amount of storage capacity used during mirroring operations, comprising:
-
storing data to a first volume and mirroring the data to a second volume, said second volume being a virtual volume having a plurality of logical addresses, wherein segments of physical storage capacity are allocated for a specified logical address as required when the data is to be stored to said specified logical address; and carrying out a de-duplication operation on the second volume following a split from the first volume by identifying a particular segment of said second volume having data that is the same as another segment;
creating a link from the particular segment to the other segment; and
releasing the particular segment from the second volume, whereby physical storage capacity required for the second volume is reduced;resynchronizing the data stored on said second volume with the data stored on said first volume prior to splitting said second volume from mirroring said first volume; and reverse resynchronizing the data stored on said first volume with data stored on said second volume, wherein before reverse resynchronizing whether the second volume has been de-duplicated or not is checked and if the second volume has been de-duplicated, said reverse resynchronizing is conducted on to the first volume from the de-duplicated volume. - View Dependent Claims (17, 18)
-
-
19. A method of operating an information system, comprising:
-
storing data to a first volume and mirroring the data to a second volume, said second volume being a virtual volume having a plurality of logical storage addresses, wherein a segment of physical storage capacity is allocated for a specified logical address as required when the data is to be stored to said specified logical address; selecting segments of the second volume during a de-duplication operation on the second volume; calculating a hash value for a particular segment representative of data contained in the particular segment; comparing the calculated hash value with previously-stored hash values for other segments in the second volume; linking the particular segment to another segment having a previously-stored hash value that matches the calculated hash value of the particular segment; releasing the particular segment from the second volume when another segment has a previously-stored hash value that matches the calculated hash value of the particular segment, whereby physical storage capacity required for the second volume is reduced; providing a first storage system in communication with a host computer, said first storage system including a first controller and multiple first disk devices, said first volume representing physical storage capacity on said first disk devices; providing a second storage system in communication with said first storage system and in a location separate from said first storage system, said second storage system including a second controller and multiple second disk devices; providing said second volume on said second storage system as a remote mirror of said first volume; and checking whether the second volume has been designated for de-duplication operations before said selecting segments of the second volume, wherein said de-duplication operation on the second volume is performed after pair configuration of the second volume to the first volume is suspended and determined that the second volume has been designated for de-duplication operations based on an information maintained in a memory of said information system. - View Dependent Claims (20)
-
Specification