×

System and method for estimating duplicate data

  • US 8,793,226 B1
  • Filed: 08/28/2007
  • Issued: 07/29/2014
  • Est. Priority Date: 08/28/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method for estimating duplicate data, comprising:

  • executing a duplicate estimation application on a system having a processor and memory;

    selecting a data element stored on a storage device of the system;

    reading a plurality of segments of data from the data element;

    computing a fingerprint for each of the plurality of segments to produce a plurality of fingerprints;

    storing the plurality of fingerprints in a fingerprint database;

    identifying a total number of fingerprints entries in the fingerprint database;

    identifying a total number of unique fingerprint entries of the total number of fingerprint entries in the fingerprint database, wherein each unique fingerprint represents a single instance of a fingerprint in the total number of fingerprint entries in the fingerprint database;

    calculating an estimated amount of duplicate data by multiplying a size of a segment of data to a value obtained by subtracting the total number of unique fingerprint entries in the fingerprint database from the total number of fingerprint entries in the fingerprint database; and

    providing the calculated estimated amount of duplicate data to a display, wherein the calculated estimated amount of duplicate data indicates estimated storage space saving that is realized by employing a data de-duplication technique to eliminate the duplicate data.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×