System and method for managing large filesystem-based caches
First Claim
1. A method of managing a cache, comprising:
- at a computer, obtaining a statistical sampling of said cache;
generating a last access histogram based on said statistical sampling of said cache, wherein said last access histogram represents a distribution of access times of entries in said cache;
generating a size histogram based on said statistical sampling of said cache, wherein said size histogram represents a distribution of entry sizes in said cache;
determining a mean size of cache entries using said size histogram;
determining a percentage of entries to be removed from said cache;
determining a cutoff time based on said last access histogram and said percentage of said entries to be removed from said cache; and
removing, from said cache, entries whose last access time is older than said cutoff time.
11 Assignments
0 Petitions
Accused Products
Abstract
Embodiments disclosed herein utilize statistical approximations to manage large filesystem-based caches based on imperfect information. When removing entries from a large cache, which may have a million or more entries, the cache manager does not need to find the absolutely oldest entry that has been accessed the least recently. Instead, it suffices to find an entry that is older than most. In embodiments disclosed herein, statistical sampling of the cache is performed to produce models of different properties of the cache, including the number of entries, distribution of access times, distribution of entry sizes, etc. The models are then used to guide decisions that involve those properties. The size of the samples can be adjusted to balance the cost of acquiring the samples against the confidence level of the models produced by the samples. To achieve randomness, entries are stored using prefixes of addresses generated via a message-digest function.
-
Citations
18 Claims
-
1. A method of managing a cache, comprising:
-
at a computer, obtaining a statistical sampling of said cache; generating a last access histogram based on said statistical sampling of said cache, wherein said last access histogram represents a distribution of access times of entries in said cache; generating a size histogram based on said statistical sampling of said cache, wherein said size histogram represents a distribution of entry sizes in said cache; determining a mean size of cache entries using said size histogram; determining a percentage of entries to be removed from said cache; determining a cutoff time based on said last access histogram and said percentage of said entries to be removed from said cache; and removing, from said cache, entries whose last access time is older than said cutoff time. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system, comprising:
-
a processor; at least one non-transitory computer-readable storage medium accessible by said processor and carrying computer instructions executable by said processor, wherein when executed by said processor said computer instructions are operable to perform; obtaining a statistical sampling of a filesystem-based cache; generating a last access histogram based on said statistical sampling of said cache, wherein said last access histogram represents a distribution of access times of entries in said cache; generating a size histogram based on said statistical sampling of said cache, wherein said size histogram represents a distribution of entry sizes in said cache; determining a mean size of cache entries using said size histogram; determining a percentage of entries to be removed from said cache; determining a cutoff time based on said last access histogram and said percentage of said entries to be removed from said cache; and removing from said cache entries whose last access time is older than said cutoff time. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A computer program product comprising at least one computer-readable storage medium accessible by a processor and carrying computer instructions executable by said processor, wherein when executed by said processor said computer instructions are operable to perform:
-
obtaining a statistical sampling of a filesystem-based cache; generating a last access histogram based on said statistical sampling of said cache, wherein said last access histogram represents a distribution of access times of entries in said cache; generating a size histogram based on said statistical sampling of said cache, wherein said size histogram represents a distribution of entry sizes in said cache; determining a mean size of cache entries using said size histogram; determining a percentage of entries to be removed from said cache; determining a cutoff time based on said last access histogram and said percentage of said entries to be removed from said cache; and removing from said cache entries whose last access time is older than said cutoff time. - View Dependent Claims (15, 16, 17, 18)
-
Specification