SYSTEM AND METHOD FOR SAMPLING BASED ELIMINATION OF DUPLICATE DATA
First Claim
Patent Images
1. A method for removing duplicate data from a data set stored on a storage system, the method comprising:
- performing, by a processor, an operation on the data set to identify an anchor within the data set, wherein the anchor is a specific section within the data set that defines a region of interest for potential data de-duplication; and
identifying, by the processor, a forward delta value and a backward delta value which collectively identify a number of consecutive bits of data that match between the data set and the stored data set forward and backward from the identified anchor, respectively.
1 Assignment
0 Petitions
Accused Products
Abstract
A technique for eliminating duplicate data is provided. Upon receipt of a new data set, one or more anchor points are identified within the data set. A bit-by-bit data comparison is then performed of the region surrounding the anchor point in the received data set with the region surrounding an anchor point stored within a pattern database to identify forward/backward delta values. The duplicate data identified by the anchor point, forward and backward delta values is then replaced in the received data set with a storage indicator.
5 Citations
20 Claims
-
1. A method for removing duplicate data from a data set stored on a storage system, the method comprising:
-
performing, by a processor, an operation on the data set to identify an anchor within the data set, wherein the anchor is a specific section within the data set that defines a region of interest for potential data de-duplication; and identifying, by the processor, a forward delta value and a backward delta value which collectively identify a number of consecutive bits of data that match between the data set and the stored data set forward and backward from the identified anchor, respectively. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system configured to remove duplicate data from a data set stored on a storage system, the system comprising:
-
a module executed by a processor of the storage system, the module adapted to identify an anchor within the data set and further adapted to determine whether the identified anchor exists within a data store storing a plurality of anchors; and the module is further adapted to, in response to, determining that the anchor exists within the data store, perform a data comparison between the data set and a stored data set to identify a forward delta value and a backward delta value. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. An non-transitory computer readable medium containing program instructions executed on a processor, comprising:
-
program instructions that perform an operation on a data set stored on a storage system to identify an anchor within the data set, wherein the anchor is a specific section within the dataset that defines a region of interest for potential data de-duplication; and program instructions that identify a forward delta value and a backward delta value which collectively identify a number of consecutive bits of data that match between the data set and a stored data set. - View Dependent Claims (19, 20)
-
Specification