System and method for accelerating anchor point detection
First Claim
1. A method, comprising:
- receiving a data set at a storage system;
in response to receiving the data set, detecting one or more anchor locations in the data set by an anchor detection circuitry of the storage system, wherein the anchor detection circuitry is a hardware device dedicated to detecting the one or more anchor locations;
transferring each anchor locations to a data de-duplication module executed by a processor of the storage system;
determining if the anchor location is located in a database;
if the anchor location is not located in the database, adding the anchor location to the database;
if the anchor location is located in the database, examining, by the de-duplication module executed by the processor, bits preceding the anchor location utilizing a first delta value and the bits following the anchor location utilizing a second delta value to identify duplicate data; and
in response to identifying the duplicate data, removing the duplicate data from the data set prior to storing the data set on a storage device operatively connected to the storage system.
2 Assignments
0 Petitions
Accused Products
Abstract
A sampling based technique for eliminating duplicate data (de-duplication) stored on storage resources, is provided. According to the invention, when a new data set, e.g., a backup data stream, is received by a server, e.g., a storage system or virtual tape library (VTL) system implementing the invention, one or more anchors are identified within the new data set. The anchors are identified using a novel anchor detection circuitry in accordance with an illustrative embodiment of the present invention. Upon receipt of the new data set by, for example, a network adapter of a VTL system, the data set is transferred using direct memory access (DMA) operations to a memory associated with an anchor detection hardware card that is operatively interconnected with the storage system. The anchor detection hardware card may be implemented as, for example, a FPGA is to quickly identify anchors within the data set. As the anchor detection process is performed using a hardware assist, the load on a main processor of the system is reduced, thereby enabling line speed de-duplication.
-
Citations
20 Claims
-
1. A method, comprising:
-
receiving a data set at a storage system; in response to receiving the data set, detecting one or more anchor locations in the data set by an anchor detection circuitry of the storage system, wherein the anchor detection circuitry is a hardware device dedicated to detecting the one or more anchor locations; transferring each anchor locations to a data de-duplication module executed by a processor of the storage system; determining if the anchor location is located in a database; if the anchor location is not located in the database, adding the anchor location to the database; if the anchor location is located in the database, examining, by the de-duplication module executed by the processor, bits preceding the anchor location utilizing a first delta value and the bits following the anchor location utilizing a second delta value to identify duplicate data; and in response to identifying the duplicate data, removing the duplicate data from the data set prior to storing the data set on a storage device operatively connected to the storage system. - View Dependent Claims (2, 3)
-
-
4. A method for anchor detection, comprising:
-
receiving a data set from a client by a storage system; in response to receiving the data set, transferring the data set to an anchor detection circuitry of the storage system, wherein the anchor detection circuitry is a hardware device dedicated to detecting the one or more anchor locations; detecting one or more anchor locations of the data set by the anchor detection circuitry; transferring each anchor locations to a de-duplication module executed by a processor of the storage system; determining if the anchor location is located in a database; if the anchor location is not located in the database, adding the anchor location to the database; and if the anchor location is located in the database, performing the data de-duplication for the data set utilizing the anchor location, wherein the anchor detection circuitry and the processor are separate hardware devices. - View Dependent Claims (5, 6, 7, 8)
-
-
9. A system, comprising:
-
anchor detection hardware configured to receive a data stream, the anchor detection hardware further configured to identify one or more anchor locations of the data stream where the anchor detection hardware is a hardware device dedicated to identifying each anchor location, the anchor detection hardware further configured to transfer the anchor location to a de-duplication module executed by a processor of the storage system; and the de-duplication module executed by the processor and configured to;
(i) determine if the anchor location is located in a database, (ii) add the anchor location to the database if the anchor location is not located in the database, and (iii) identify duplicate data using the anchor location and remove the duplicate data from the received data stream if the anchor location is located in the database, wherein the anchor detection hardware and the processor are separate hardware devices. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system comprising:
-
means for receiving a data set from a client; means for transferring the data set to an anchor detection hardware of a storage system in response to receiving the data set; means for detecting one or more anchor locations in the data set by the anchor detection hardware that is a hardware device dedicated for detecting anchor locations; means for transferring each anchor location from the anchor detection hardware to a de-duplication module executed by a processor of the storage system, wherein the anchor detection hardware and the processor are separate hardware devices; and means for determining if the anchor location is located in a database in response to transferring the anchor location; means for adding the anchor location to the database if the anchor location is not located in the database; means for examining bits preceding the anchor location utilizing a first delta value and means for examining the bits following the anchor location utilizing a second delta value to identify duplicate data if the anchor location is located in the database; and means for removing the duplicate data from the data set prior to storing the data set on a storage device operatively connected to the storage system. - View Dependent Claims (20)
-
Specification