Sampling based elimination of duplicate data
First Claim
Patent Images
1. A method for removing duplicate data stored on a storage system, the method comprising:
- performing an operation on a first data set to identify an anchor within the first data set, wherein the anchor defines a starting point in a first region of the first data set for potential data de-duplication;
determining a number of consecutive bits or bytes of data that match between the first data set and a second data set forwards and backwards from the identified anchor; and
replacing the matching data in the first data set with an indication of the second data set, the anchor, and the number of matching bits or bytes forwards from the anchor and the number of matching bits or bytes backwards from the anchor.
1 Assignment
0 Petitions
Accused Products
Abstract
A technique for eliminating duplicate data is provided. Upon receipt of a new data set, one or more anchor points are identified within the data set. A bit-by-bit data comparison is then performed of the region surrounding the anchor point in the received data set with the region surrounding an anchor point stored within a pattern database to identify forward/backward delta values. The duplicate data identified by the anchor point, forward and backward delta values is then replaced in the received data set with a storage indicator.
57 Citations
19 Claims
-
1. A method for removing duplicate data stored on a storage system, the method comprising:
-
performing an operation on a first data set to identify an anchor within the first data set, wherein the anchor defines a starting point in a first region of the first data set for potential data de-duplication; determining a number of consecutive bits or bytes of data that match between the first data set and a second data set forwards and backwards from the identified anchor; and replacing the matching data in the first data set with an indication of the second data set, the anchor, and the number of matching bits or bytes forwards from the anchor and the number of matching bits or bytes backwards from the anchor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 17)
-
-
8. A system configured to remove duplicate data, the system comprising:
-
a processor; a computer readable medium comprising program code stored therein, the program code executable by the processor to cause the system to, identify an anchor within a first data set, wherein the anchor defines a starting point in a first region of the first data set for potential data de-duplication; determine whether the identified anchor exists within a data store storing a plurality of anchors; in response to determining that the anchor exists within the data store, perform a data comparison between the first data set and a second data set forwards from the anchor and backwards from the anchor to determine a forwards delta value and a backwards delta value; and replace matching data in the first data set with an indication of the second data set, an indication of the anchor, the forwards delta value, and the backwards delta value. - View Dependent Claims (9, 10, 11, 12, 13, 18)
-
-
14. A non-transitory computer readable medium comprising program instructions for data de-duplication, the program instructions:
-
program instructions that perform an operation on a first data set to identify an anchor within the first data set, wherein the anchor defines a starting point within a first region of the first data set for potential data de-duplication; determine consecutive data forwards from the anchor that matches consecutive data forwards from the anchor in a second data set and consecutive data backwards from the anchor in the first data set that matches consecutive data backwards from the anchor in the second data set, wherein the consecutive forwards matching data is represented with a forwards delta value and the consecutive backwards matching data is represented with a backwards delta value; and replace the matching data in the first data set with an indication of the anchor, the second data set, the forwards delta value, and the backwards delta value. - View Dependent Claims (15, 16, 19)
-
Specification