Efficiently estimating compression ratio in a deduplicating file system
First Claim
Patent Images
1. A deduplicating storage system, comprising:
- a processor configured to;
for each of k times;
associate a bin of an ordered set of bins with each received identifier, wherein each bin in the ordered set of bins has a bin number and each received identifier comprises a fingerprint of a segment of a set of segments stored on a file system of the deduplicating storage system;
determine a minimum bin number associated with each received identifier, the minimum bin number being the bin number that is minimum among the bins associated with the each received identifier;
repeat the k times of associating a bin with a received identifier for n trials, where n is greater than two;
determine an estimate of a quantity of unique identifiers based at least in part on an average of the minimum associated bin number;
determine a data compression ratio of the segments stored in the file system of the deduplicating storage system based on the estimated quantity of the unique identifiers without having to record a list of the unique identifiers and check the list of the unique identifiers for the each received identifier;
determine a capacity of the deduplicating storage system; and
back up data to the system of the deduplicating storage system based on the determined capacity of the deduplicating storage system and the determined data compression ratio of the segments stored therein; and
a memory coupled to the processor and configured to provide the processor with instructions.
9 Assignments
0 Petitions
Accused Products
Abstract
A system for estimating a quantity of unique identifiers comprises a processor and a memory. The processor is configured to, for each of k times, associate a bin of a set of bins with each received identifier. The processor is further configured to determine an estimate of the quantity of unique identifiers based at least in part on an average minimum associated bin value. The memory is coupled to the processor and configured to provide the processor with instructions.
11 Citations
23 Claims
-
1. A deduplicating storage system, comprising:
-
a processor configured to; for each of k times;
associate a bin of an ordered set of bins with each received identifier, wherein each bin in the ordered set of bins has a bin number and each received identifier comprises a fingerprint of a segment of a set of segments stored on a file system of the deduplicating storage system;determine a minimum bin number associated with each received identifier, the minimum bin number being the bin number that is minimum among the bins associated with the each received identifier; repeat the k times of associating a bin with a received identifier for n trials, where n is greater than two; determine an estimate of a quantity of unique identifiers based at least in part on an average of the minimum associated bin number; determine a data compression ratio of the segments stored in the file system of the deduplicating storage system based on the estimated quantity of the unique identifiers without having to record a list of the unique identifiers and check the list of the unique identifiers for the each received identifier; determine a capacity of the deduplicating storage system; and back up data to the system of the deduplicating storage system based on the determined capacity of the deduplicating storage system and the determined data compression ratio of the segments stored therein; and a memory coupled to the processor and configured to provide the processor with instructions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method for data backup based on estimates of a quantity of unique identifiers in a deduplicating storage system comprising:
-
for each of k times;
associating a bin of an ordered set of bins with each received identifier, wherein each bin in the ordered set of bins has a bin number and each received identifier comprises a fingerprint of a segment of a set of segments stored on a file system of the deduplicating storage system;determining, using a processor, a minimum bin number associated with each received identifier, the minimum bin number being the bin number that is minimum among the bins associated with the each received identifier; repeating the k times of associating a bin with a received identifier for n trials, where n is greater than two; determining an estimate of a quantity of unique identifiers based at least in part on an average minimum associated bin number; determining a data compression ratio of the segments stored in the file system of the deduplicating storage system based on the estimated quantity of the unique identifiers without having to record a list of the unique identifiers and check the list of the unique identifiers for the each received identifier; determining a capacity of the deduplicating storage system; and backing up data to the file system of the deduplicating storage system based on the determined capacity of the deduplicating storage system and the determined data compression ratio of the segments stored therein. - View Dependent Claims (19, 20)
-
-
21. A computer program product, the computer program product being embedded in a non-transitory computer readable storage medium and comprising computer instructions for performing a method for data backup based on estimates of a quantity of unique identifiers in a deduplicating storage system comprising:
-
for each of k times;
associating a bin of an ordered set of bins with each received identifier, wherein each bin in the ordered set of bins has a bin number and each received identifier comprises a fingerprint of a segment of a set of segments stored on a file system of the deduplicating storage system;determining, using a processor, a minimum bin number associated with each received identifier, the minimum bin number being the bin number that is minimum among the bins associated with the each received identifier; repeating the k times of associating a bin with a received identifier for n trials, where n is greater than two; determining an estimate of a quantity of unique identifiers based at least in part on an average minimum associated bin number; determining a data compression ratio of the segments stored in the file system of the deduplicating storage system based on the estimated quantity of the unique identifiers without having to record a list of the unique identifiers and check the list of the unique identifiers for the each received identifier; determining a capacity of the deduplicating storage system; and backing up data to the file system of the deduplicating storage system based on the determined capacity of the deduplicating storage system and the determined data compression ratio of the segments stored therein. - View Dependent Claims (22, 23)
-
Specification