ACCELERATED DEDUPLICATION

US 20130018853A1
Filed: 12/01/2011
Published: 01/17/2013
Est. Priority Date: 07/11/2011
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

receiving a data stream at an input interface, the input interface connected to a processor, memory, and a deduplication accelerator;

performing chunk boundary identification and chunk fingerprinting at the deduplication accelerator in a single stage, wherein chunk boundary identification delineates a first chunk and chunk fingerprinting calculates a chunk identifier for the first chunk, wherein chunk boundary identification and chunk fingerprinting are used to perform deduplication on the data stream;

maintaining a state machine using the processor;

accessing a deduplication dictionary using the chunk identifier to determine whether a first chunk has previously been written to persistent storage.

View all claims

16 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Mechanisms are provided for accelerated data deduplication. A data stream is received an input interface and maintained in memory. Chunk boundaries are detected and chunk fingerprints are calculated using a deduplication accelerator while a processor maintains a state machine. A deduplication dictionary is accessed using a chunk fingerprint to determine if the associated data chunk has previously been written to persistent memory. If the data chunk has previously been written, reference counts may be updated but the data chunk need not be stored again. Otherwise, datastore suitcases, filemaps, and the deduplication dictionary may be updated to reflect storage of the data chunk. Direct memory access (DMA) addresses are provided to directly transfer a chunk to an output interface as needed.

75 Citations

View as Search Results

20 Claims

1. A method, comprising:
- receiving a data stream at an input interface, the input interface connected to a processor, memory, and a deduplication accelerator;
  
  performing chunk boundary identification and chunk fingerprinting at the deduplication accelerator in a single stage, wherein chunk boundary identification delineates a first chunk and chunk fingerprinting calculates a chunk identifier for the first chunk, wherein chunk boundary identification and chunk fingerprinting are used to perform deduplication on the data stream;
  
  maintaining a state machine using the processor;
  
  accessing a deduplication dictionary using the chunk identifier to determine whether a first chunk has previously been written to persistent storage.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein chunk boundary identification and chunk fingerprinting are performed in a single pipeline stage.
  - 3. The method of claim 2, wherein the processor is operable to maintain a state machine used by the deduplication accelerator for chunk boundary identification and chunk fingerprinting.
  - 4. The method of claim 1, wherein the deduplication dictionary includes a plurality of chunk identifiers corresponding to a plurality of storage locations for a plurality of data chunks.
  - 5. The method of claim 1, wherein chunk fingerprinting comprises calculating a hash for the first chunk delineated by identified chunk boundaries.
  - 6. The method of claim 1, wherein the CPU maintains a state machine and implements data stream optimizing without performing chunk boundary identification and chunk fingerprinting on the data stream.
  - 7. The method of claim 1, wherein the CPU uses direct memory access to transfer data to target addresses.
  - 8. The method of claim 1, wherein the deduplication accelerator is an application specific integrated circuit (ASIC).
  - 9. The method of claim 1, wherein the deduplication accelerator is a programmable logic device.

10. A deduplication accelerator, comprising:
- an input interface configured to read a data stream stored in memory by a central processing unit (CPU);
  
  logic configured to perform chunk boundary identification and chunk fingerprinting for the data stream in a single pass read to perform data deduplication on the data stream, wherein chunk boundary identification delineates a first chunk and chunk fingerprinting calculates a chunk identifier for the first chunk, wherein the chunk identifier is used to access a deduplication dictionary to determine whether the first chunk has previously been written to persistent storage.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The deduplication accelerator of claim 10, wherein chunk boundary identification and chunk fingerprinting are performed in a single pipeline stage.
  - 12. The deduplication accelerator of claim 11, wherein the CPU is operable to maintain a state machine used for chunk boundary identification and chunk fingerprinting.
  - 13. The deduplication accelerator of claim 10, wherein the deduplication dictionary includes a plurality of chunk identifiers corresponding to a plurality of storage locations for a plurality of data chunks.
  - 14. The deduplication accelerator of claim 10, wherein chunk fingerprinting comprises calculating a hash for the first chunk delineated by identified chunk boundaries.
  - 15. The deduplication accelerator of claim 10, wherein the CPU maintains a state machine and implements data stream optimizing without performing chunk boundary identification and chunk fingerprinting on the data stream.
  - 16. The deduplication accelerator of claim 10, wherein the CPU uses direct memory access to transfer data to target addresses.
  - 17. The deduplication accelerator of claim 10, wherein the deduplication accelerator is an application specific integrated circuit (ASIC).
  - 18. The deduplication accelerator of claim 10, wherein the deduplication accelerator is a programmable logic device.

19. A system, comprising:
- a processor operable to maintain a state machine for performing deduplication on a data stream;
  
  memory operable to maintain the data stream;
  
  a deduplication accelerator configured to perform chunk boundary identification and chunk fingerprinting in a single stage, wherein chunk boundary identification delineates a first chunk and chunk fingerprinting calculates a chunk identifier for the first chunk, wherein chunk boundary identification and chunk fingerprinting are used to perform deduplication on the data stream;
  
  wherein a deduplication dictionary is accessed using the chunk identifier to determine whether a first chunk has previously been written to persistent storage.
- View Dependent Claims (20)
- - 20. The method of claim 19, wherein chunk boundary identification and chunk fingerprinting are performed in a single pipeline stage.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Quest Software, Inc.
Original Assignee
Dell Products LP (Dell Technologies Inc.)
Inventors
Jayaraman, Vinod, Rao, Goutham

Granted Patent

US 8,521,705 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/692
CPC Class Codes

G06F 16/1752   based on file chunks

G06F 3/0608   Saving storage space on sto...

G06F 3/0641   De-duplication techniques

G06F 3/0671   In-line storage system

ACCELERATED DEDUPLICATION

First Claim

16 Assignments

0 Petitions

Accused Products

Abstract

75 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

ACCELERATED DEDUPLICATION

First Claim

16 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

75 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links