Selecting files for compaction
First Claim
1. A computer-implemented method comprising:
- identifying two or more files, each file being a persistent electronic file stored as a separate file in a persistent file storage, each file including multiple entries;
determining a respective size of each of the two or more files, each size being an estimate of how many distinct entries exist in the respective file that are not garbage entries;
determining a combined size of the two or more files, where the combined size of the two or more files is an arithmetic sum of the respective sizes of the two or more files;
estimating a compacted size of the two or more files, where the estimated compacted size of the two or more files is an estimate of how many distinct entries exist in the two or more files that are not garbage entries when the two or more files are taken together;
determining that a result of comparing the combined size of the two or more files to the estimated compacted size of the two or more files satisfies a threshold; and
in response to determining that the result of comparing the combined size of the two or more files to the estimated compacted size of the two or more files satisfies the threshold, compacting the two or more files to generate a single compacted file including multiple entries, where each entry of the single compacted file is a distinct entry that is not a garbage entry.
4 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus for identifying two or more files, each of which include multiple entries, determining a respective size of each of the files, each size being an estimate of how many distinct entries exist in the respective file that are not garbage entries, determining a combined size of the files, where the combined size of the files is an arithmetic sum of the respective sizes of the files, estimating a compacted size of the files, where the estimated compacted size of the files is an estimate of how many distinct entries exist in the files that are not garbage entries, selecting the two or more files for compaction, based at least on a comparison of the combined size of the files to the estimated compacted size of the files, and compacting the two or more selected files.
15 Citations
27 Claims
-
1. A computer-implemented method comprising:
-
identifying two or more files, each file being a persistent electronic file stored as a separate file in a persistent file storage, each file including multiple entries; determining a respective size of each of the two or more files, each size being an estimate of how many distinct entries exist in the respective file that are not garbage entries; determining a combined size of the two or more files, where the combined size of the two or more files is an arithmetic sum of the respective sizes of the two or more files; estimating a compacted size of the two or more files, where the estimated compacted size of the two or more files is an estimate of how many distinct entries exist in the two or more files that are not garbage entries when the two or more files are taken together; determining that a result of comparing the combined size of the two or more files to the estimated compacted size of the two or more files satisfies a threshold; and in response to determining that the result of comparing the combined size of the two or more files to the estimated compacted size of the two or more files satisfies the threshold, compacting the two or more files to generate a single compacted file including multiple entries, where each entry of the single compacted file is a distinct entry that is not a garbage entry. - View Dependent Claims (2, 3, 4, 5, 6, 7, 25)
-
-
8. A system comprising:
-
a plurality of computers; and a non-transitory storage device storing instructions operable to cause the computers to perform operations comprising; identifying two or more files, each file being a persistent electronic file stored as a separate file in a persistent file storage, each file including multiple entries; determining a respective size of each of the two or more files, each size being an estimate of how many distinct entries exist in the respective file that are not garbage entries; determining a combined size of the two or more files, where the combined size of the two or more files is an arithmetic sum of the respective sizes of the two or more files; estimating a compacted size of the two or more files, where the estimated compacted size of the two or more files is an estimate of how many distinct entries exist in the two or more files that are not garbage entries when the two or more files are taken together; determining that a result of comparing the combined size of the two or more files to the estimated compacted size of the two or more files satisfies a threshold; and in response to determining that the result of comparing the combined size of the two or more files to the estimated compacted size of the two or more files satisfies the threshold, compacting the two or more files to generate a single compacted file including multiple entries, where each entry of the single compacted file is a distinct entry that is not a garbage entry. - View Dependent Claims (9, 10, 11, 12, 13, 14, 21, 22, 23, 24, 26)
-
-
15. A non-transitory storage device storing instructions operable to cause one or more computers to perform operations comprising:
-
identifying two or more files, each file being a persistent electronic file stored as a separate file in a persistent file storage, each file including multiple entries; determining a respective size of each of the two or more files, each size being an estimate of how many distinct entries exist in the respective file that are not garbage entries; determining a combined size of the two or more files, where the combined size of the two or more files is an arithmetic sum of the respective sizes of the two or more files; estimating a compacted size of the two or more files, where the estimated compacted size of the two or more files is an estimate of how many distinct entries exist in the two or more files that are not garbage entries when the two or more files are taken together; determining that a result of comparing the combined size of the two or more files to the estimated compacted size of the two or more files satisfies a threshold; and in response to determining that the result of comparing the combined size of the two or more files to the estimated compacted size of the two or more files satisfies the threshold, compacting the two or more files to generate a single compacted file including multiple entries, where each entry of the single compacted file is a distinct entry that is not a garbage entry. - View Dependent Claims (16, 17, 18, 19, 20, 27)
-
Specification