×

Efficiently estimating compression ratio in a deduplicating file system

  • US 9,026,752 B1
  • Filed: 12/22/2011
  • Issued: 05/05/2015
  • Est. Priority Date: 12/22/2011
  • Status: Active Grant
First Claim
Patent Images

1. A system for estimating a compression ratio of a deduplicating storage system, comprising:

  • a processor configured to;

    process an incoming stream of data into a set of segments;

    for each of k times, associate a bin of an ordered set of bins with each received identifier using a hash function, wherein each received identifier comprises a fingerprint of a segment of the set of segments processed by a data fingerprinter coupled to the deduplicating storage system;

    store only a minimum bin number resulting from the k times of hashing each received identifier, wherein the processor only stores one value for the k times of hashing each received identifier;

    repeat the k times of associating a bin with a received identifier for n trials, where n is greater than two;

    determine an average minimum associated bin number, wherein the average minimum associated bin number comprises an average of the minimum bin number over the n trials;

    determining an estimate of a quantity of unique identifiers comprises dividing a total number of bins by the average minimum associated bin value and subtracting one;

    determine an estimation of a compression ratio for a deduplicating file system based at least in part on the estimate of the quantity of unique identifiers; and

    a memory coupled to the processor and configured to provide the processor with instructions.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×