ACCELERATED DEDUPLICATION
First Claim
1. A method, comprising:
- receiving a data stream at an input interface, the input interface connected to a processor, memory, and a deduplication accelerator;
performing chunk boundary identification and chunk fingerprinting at the deduplication accelerator in a single stage, wherein chunk boundary identification delineates a first chunk and chunk fingerprinting calculates a chunk identifier for the first chunk, wherein chunk boundary identification and chunk fingerprinting are used to perform deduplication on the data stream;
maintaining a state machine using the processor;
accessing a deduplication dictionary using the chunk identifier to determine whether a first chunk has previously been written to persistent storage.
16 Assignments
0 Petitions
Accused Products
Abstract
Mechanisms are provided for accelerated data deduplication. A data stream is received an input interface and maintained in memory. Chunk boundaries are detected and chunk fingerprints are calculated using a deduplication accelerator while a processor maintains a state machine. A deduplication dictionary is accessed using a chunk fingerprint to determine if the associated data chunk has previously been written to persistent memory. If the data chunk has previously been written, reference counts may be updated but the data chunk need not be stored again. Otherwise, datastore suitcases, filemaps, and the deduplication dictionary may be updated to reflect storage of the data chunk. Direct memory access (DMA) addresses are provided to directly transfer a chunk to an output interface as needed.
75 Citations
20 Claims
-
1. A method, comprising:
-
receiving a data stream at an input interface, the input interface connected to a processor, memory, and a deduplication accelerator; performing chunk boundary identification and chunk fingerprinting at the deduplication accelerator in a single stage, wherein chunk boundary identification delineates a first chunk and chunk fingerprinting calculates a chunk identifier for the first chunk, wherein chunk boundary identification and chunk fingerprinting are used to perform deduplication on the data stream; maintaining a state machine using the processor; accessing a deduplication dictionary using the chunk identifier to determine whether a first chunk has previously been written to persistent storage. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A deduplication accelerator, comprising:
-
an input interface configured to read a data stream stored in memory by a central processing unit (CPU); logic configured to perform chunk boundary identification and chunk fingerprinting for the data stream in a single pass read to perform data deduplication on the data stream, wherein chunk boundary identification delineates a first chunk and chunk fingerprinting calculates a chunk identifier for the first chunk, wherein the chunk identifier is used to access a deduplication dictionary to determine whether the first chunk has previously been written to persistent storage. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system, comprising:
-
a processor operable to maintain a state machine for performing deduplication on a data stream; memory operable to maintain the data stream; a deduplication accelerator configured to perform chunk boundary identification and chunk fingerprinting in a single stage, wherein chunk boundary identification delineates a first chunk and chunk fingerprinting calculates a chunk identifier for the first chunk, wherein chunk boundary identification and chunk fingerprinting are used to perform deduplication on the data stream; wherein a deduplication dictionary is accessed using the chunk identifier to determine whether a first chunk has previously been written to persistent storage. - View Dependent Claims (20)
-
Specification