System and method for accelerating anchor point detection
First Claim
1. A method, comprising:
- receiving a data set at a storage system;
in response to receiving the data set, detecting one or more anchor locations in the data set by an anchor detection circuitry of the storage system, wherein the anchor detection circuitry is dedicated for detecting the one or more anchor locations;
transferring each anchor location to a data de-duplication module executed by a processor of the storage system;
examining, by the de-duplication module executed by the processor, bits preceding the anchor location utilizing a first delta value and the bits following the anchor location utilizing a second delta value to identify duplicate data; and
in response to identifying the duplicate data, removing the duplicate data from the data set prior to storing the data set on a storage device operatively connected to the storage system.
0 Assignments
0 Petitions
Accused Products
Abstract
A sampling based technique for eliminating duplicate data (de-duplication) stored on storage resources, is provided. According to the invention, when a new data set, e.g., a backup data stream, is received by a server, e.g., a storage system or virtual tape library (VTL) system implementing the invention, one or more anchors are identified within the new data set. The anchors are identified using a novel anchor detection circuitry in accordance with an illustrative embodiment of the present invention. Upon receipt of the new data set by, for example, a network adapter of a VTL system, the data set is transferred using direct memory access (DMA) operations to a memory associated with an anchor detection hardware card that is operatively interconnected with the storage system. The anchor detection hardware card may be implemented as, for example, a FPGA is to quickly identify anchors within the data set. As the anchor detection process is performed using a hardware assist, the load on a main processor of the system is reduced, thereby enabling line speed de-duplication.
48 Citations
15 Claims
-
1. A method, comprising:
-
receiving a data set at a storage system; in response to receiving the data set, detecting one or more anchor locations in the data set by an anchor detection circuitry of the storage system, wherein the anchor detection circuitry is dedicated for detecting the one or more anchor locations; transferring each anchor location to a data de-duplication module executed by a processor of the storage system; examining, by the de-duplication module executed by the processor, bits preceding the anchor location utilizing a first delta value and the bits following the anchor location utilizing a second delta value to identify duplicate data; and in response to identifying the duplicate data, removing the duplicate data from the data set prior to storing the data set on a storage device operatively connected to the storage system. - View Dependent Claims (2, 3, 4)
-
-
5. A method, comprising:
-
detecting one or more anchor locations of a data set by an anchor detection circuitry of a storage system, wherein the anchor detection circuitry is a hardware device dedicated for detecting the one or more anchor locations; transferring each anchor location to a de-duplication module executed by a processor of the storage system, wherein the anchor detection circuitry and the processor are separate hardware devices; determining if the anchor location is located in a database; if the anchor location is located in the database, performing data de-duplication for the data by examining a region surrounding the anchor location utilizing a first delta value identifying a number of consecutive bits before the anchor location and a second delta value identifying a number of consecutive bits after the anchor location to identify duplicate data; and in response to identifying the duplicate data, removing the duplicate data from the data set prior to storing the data set on a storage device operatively connected to the storage system. - View Dependent Claims (6, 7, 8, 9)
-
-
10. A system, comprising:
-
anchor detection hardware of a storage system that is configured to receive a data stream from a client and identify one or more anchor locations of the data stream, wherein the anchor detection hardware is a device dedicated to identifying each anchor location, the anchor detection hardware further configured to transfer the anchor location to a de-duplication module executed by a processor of the storage system, wherein the processor is a separate device from the anchor detection hardware; and the de-duplication module executed by the processor and configured to;
(i) determine if the anchor location is located in a database, (ii) add the anchor location to the database if the anchor location is not located in the database, and (iii) identify duplicate data and remove the duplicate data if the anchor location is located in the database, where the duplicate data is identified by examining bits preceding the anchor location utilizing a first delta value and examining bits following the anchor location utilizing a second delta value. - View Dependent Claims (11, 12, 13, 14, 15)
-
Specification