×

Memory efficient sanitization of a deduplicated storage system

  • US 9,430,164 B1
  • Filed: 02/08/2013
  • Issued: 08/30/2016
  • Est. Priority Date: 02/08/2013
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for sanitizing a storage system, the method comprising:

  • for each of a plurality of files stored in a file system of the storage system,obtaining a list of fingerprints representing data chunks of the file from a checkpointed on disk fingerprint-to-container (FTC) index, wherein the data chunks are deduplicated data chunks, and wherein at least one data chunk is referenced by multiple files in the file system;

    for each of the fingerprints,performing a lookup operation based on the fingerprint in a cache storing a plurality of cache entries, each mapping a fingerprint to a container identifier (ID) storing the corresponding data chunk and a chunk ID indicating a storage location of the data chunk within the container;

    identifying a first container ID identifying a first container storing a data chunk corresponding to the fingerprint from a first cache entry matching the fingerprint,determining from the first cache entry a first chunk ID identifying a storage location of the first container in which the data chunk is stored, andin response to determining that the fingerprint is not found in the cache;

    looking up the fingerprint in the FTC index to identify the first container ID storing the corresponding data chunk represented by the fingerprint;

    reading, into the cache, metadata of the first container having the first container ID; and

    looking up the first chunk ID, using the fingerprint, in the metadata of the first container having the first container ID;

    populating a bit in a copy bit vector (CBV) based on the first container ID and the first chunk ID, the CBV including a plurality of bits and each storing a bit value indicating whether a data chunk is to be copied, wherein a data chunk with a corresponding bit having a predetermined bit value in the CBV is a live data chunk, wherein a live data chunk is referenced by at least one of the files in the file system;

    after all of the bits corresponding to the fingerprints in the plurality of files have been populated in the CBV, copying live data chunks represented by the CBV from the first container to a second container; and

    erasing records of the data chunks in the first container after the live data chunks of the first container indicated by the CBV have been copied to the second container to reclaim a storage space associated with the first container, including padding a predetermined data value in the first container, and releasing the first container back to a pool of free containers for future reuse.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×