System and method for sampling based elimination of duplicate data
First Claim
Patent Images
1. A method for removing duplicate data from a data set, the method comprising the steps of:
- identifying an anchor within the data set;
determining whether the identified anchor exists within an anchor database;
in response to determining that the anchor exists within the anchor database, performing a data comparison between the data set and a stored data set to identify a forward delta value and a backward delta value relative to the identified anchor; and
replacing a region of the data set identified by the anchor, the forward delta value and the backward delta value with a storage indicator to form a modified data set.
2 Assignments
0 Petitions
Accused Products
Abstract
A technique for eliminating duplicate data is provided. Upon receipt of a new data set, one or more anchor points are identified within the data set. A bit-by-bit data comparison is then performed of the region surrounding the anchor point in the received data set with the region surrounding an anchor point stored within a pattern database to identify forward/backward delta values. The duplicate data identified by the anchor point, forward and backward delta values is then replaced in the received data set with a storage indicator.
313 Citations
28 Claims
-
1. A method for removing duplicate data from a data set, the method comprising the steps of:
-
identifying an anchor within the data set;
determining whether the identified anchor exists within an anchor database;
in response to determining that the anchor exists within the anchor database, performing a data comparison between the data set and a stored data set to identify a forward delta value and a backward delta value relative to the identified anchor; and
replacing a region of the data set identified by the anchor, the forward delta value and the backward delta value with a storage indicator to form a modified data set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system configured to remove duplicate data from a data set, the system comprising:
-
means for identifying an anchor within the data set;
means for determining whether the identified anchor exists within an anchor database;
in response to determining that the anchor exists within the anchor database, means for performing a data comparison between the data set and a stored data set to identify a forward delta value and a backward delta value relative to the identified anchor; and
means for replacing a region of the data set identified by the anchor, the forward delta value and the backward delta value with a storage indicator. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A system configured to remove duplicate data from a data set, the system comprising:
-
a storage system configured to serve the data set; and
a virtual tape library system adapted to receive the data set from the storage system, the virtual tape library system adapted to identify an anchor within the data set and further adapted to determine whether the identified anchor exists within an anchor database. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A computer readable medium for removing duplicate data from a data set, the computer readable medium including program instructions for performing the steps of:
-
identifying an anchor within the data set;
determining whether the identified anchor exists within an anchor database;
in response to determining that the anchor exists within the anchor database, performing a data comparison between the data set and a stored data set to identify a forward delta value and a backward delta value relative to the identified anchor; and
replacing a region of the data set identified by the anchor, the forward delta value and the backward delta value with a storage indicator to form a modified data set.
-
Specification