System and method for managing large filesystem-based caches
First Claim
1. A method of managing a cache, comprising:
- in at least one computer communicatively connected to said cache,determining a percentage of entries to be removed from said cache based on a size model indicative of entry sizes in said cache, said size model based on a statistical sampling of at least parts of said cache;
determining a cutoff time based on said percentage of entries and a last access model indicative of access times of entries in said cache, said last access model based on said statistical sampling; and
removing, from said cache, entries whose last access time is older than said cutoff time.
11 Assignments
0 Petitions
Accused Products
Abstract
Embodiments disclosed herein utilize statistical approximations to manage large filesystem-based caches based on imperfect information. When removing entries from a large cache, which may have a million or more entries, the cache manager does not need to find the absolutely oldest entry that has been accessed the least recently. Instead, it suffices to find an entry that is older than most. In embodiments disclosed herein, statistical sampling of the cache is performed to produce models of different properties of the cache, including the number of entries, distribution of access times, distribution of entry sizes, etc. The models are then used to guide decisions that involve those properties. The size of the samples can be adjusted to balance the cost of acquiring the samples against the confidence level of the models produced by the samples. To achieve randomness, entries are stored using prefixes of addresses generated via a message-digest function.
-
Citations
20 Claims
-
1. A method of managing a cache, comprising:
-
in at least one computer communicatively connected to said cache, determining a percentage of entries to be removed from said cache based on a size model indicative of entry sizes in said cache, said size model based on a statistical sampling of at least parts of said cache; determining a cutoff time based on said percentage of entries and a last access model indicative of access times of entries in said cache, said last access model based on said statistical sampling; and removing, from said cache, entries whose last access time is older than said cutoff time. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system, comprising:
-
at least one processor; at least one non-transitory computer-readable storage medium accessible by said at least one processor and carrying computer instructions executable by said at least one processor, wherein when executed by said at least one processor said computer instructions are operable to perform; determining a percentage of entries to be removed from a filesystem-based cache based on a size model indicative of entry sizes in said cache, said size model based on a statistical sampling of said cache; determining a cutoff time based on said percentage of said entries to be removed from said cache and a last access model indicative of access times of entries in said cache, said last access model based on said statistical sampling; and removing from said cache entries whose last access time is older than said cutoff time. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A computer program product comprising at least one non-transitory computer-readable storage medium accessible by a processor and carrying computer instructions executable by said processor, wherein when executed by said processor said computer instructions are operable to perform:
-
determining a percentage of entries to be removed from a filesystem-based cache based on a size model indicative of entry sizes in said cache, said size model based on a statistical sampling of at least parts of said cache; determining a cutoff time based on said percentage of said entries to be removed from said cache and a last access model indicative of access times of entries in said cache, said last access model based on said statistical sampling; and removing from said cache entries whose last access time is older than said cutoff time. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A method of managing a cache, comprising:
-
at a computer, determining a percentage of entries to be removed from said cache based on a size model indicative of entry sizes in said cache, said size model based on a statistical sampling of at least parts of said cache; and removing from said cache, entries whose last access time is older than a cutoff time, based on said percentage of entries. - View Dependent Claims (19, 20)
-
Specification