Machine learning for metadata cache management
First Claim
1. A method comprising:
- measuring, for each of a plurality of address spaces, an amount of randomness in a plurality of accesses to the plurality of address spaces; and
evicting metadata stored in a cache that is associated with an address space corresponding to a measured amount of randomness that is greater than a particular threshold;
wherein;
measuring said amount of randomness comprises;
capturing a plurality of addresses from the plurality of accesses;
generating a first frequency domain representation of a first plurality of addresses from the captured plurality of addresses, wherein the first plurality of addresses correspond to a first region of the logical address space, and wherein the first frequency domain representation has a first frequency distribution;
measuring an amount of randomness in the first frequency distribution by adding together frequency component values above a first cutoff frequency in the first frequency distribution;
identifying the first region as a relatively low random region responsive to determining the frequency component values above the first cutoff frequency are less than a first threshold; and
identifying the first region as a relatively high random region responsive to determining the frequency component values above the first cutoff frequency are greater than a first threshold;
wherein the plurality of accesses target a logical address space.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for efficiently caching metadata in a storage system. Addresses from a plurality of I/O accesses to the storage system are captured and then a frequency domain representation of the addresses is generated. The frequency domain representation is used to measure the randomness of the various applications which are accessing the storage system. Scores are generated based on the measure of randomness, and scores are assigned to the various regions of the logical address space. Scores are then assigned to the metadata pages which are stored in the cache based on the region of the logical address space to which the metadata pages correspond. The scores are used when determining which metadata pages to evict from the cache. The cache will attempt to evict those metadata pages which correspond to regions of the logical address space that are servicing random I/O accesses.
-
Citations
14 Claims
-
1. A method comprising:
-
measuring, for each of a plurality of address spaces, an amount of randomness in a plurality of accesses to the plurality of address spaces; and evicting metadata stored in a cache that is associated with an address space corresponding to a measured amount of randomness that is greater than a particular threshold;
wherein;measuring said amount of randomness comprises; capturing a plurality of addresses from the plurality of accesses; generating a first frequency domain representation of a first plurality of addresses from the captured plurality of addresses, wherein the first plurality of addresses correspond to a first region of the logical address space, and wherein the first frequency domain representation has a first frequency distribution; measuring an amount of randomness in the first frequency distribution by adding together frequency component values above a first cutoff frequency in the first frequency distribution; identifying the first region as a relatively low random region responsive to determining the frequency component values above the first cutoff frequency are less than a first threshold; and identifying the first region as a relatively high random region responsive to determining the frequency component values above the first cutoff frequency are greater than a first threshold; wherein the plurality of accesses target a logical address space. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system comprising:
-
a cache, wherein the cache is configured to store metadata; and a storage controller; wherein the storage controller is configured to; measure, for each of a plurality of address spaces, an amount of randomness in a plurality of accesses to the plurality of address spaces; and evict metadata stored in the cache that is associated with an address space corresponding to a measured amount of randomness that is greater than a particular threshold;
whereinmeasuring said amount of randomness comprises; capturing a plurality of addresses from the plurality of accesses; generating a first frequency domain representation of a first plurality of addresses from the captured plurality of addresses, wherein the first plurality of addresses correspond to a first region of the logical address space, and wherein the first frequency domain representation has a first frequency distribution; measuring an amount of randomness in the first frequency distribution by adding together frequency component values above a first cutoff frequency in the first frequency distribution; identifying the first region as a relatively low random region responsive to determining the frequency component values above the first cutoff frequency are less than a first threshold; and identifying the first region as a relatively high random region responsive to determining the frequency component values above the first cutoff frequency are greater than a first threshold; wherein the plurality of accesses target a logical address space. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A non-transitory computer readable storage medium storing program instructions, wherein the program instructions are executable by a processor to:
-
measure, for each of a plurality of address spaces, an amount of randomness in a plurality of accesses to the plurality of address spaces; and evict metadata stored in a cache that is associated with an address space corresponding to a measured amount of randomness that is greater than a particular threshold;
wherein;measuring said amount of randomness comprises; capturing a plurality of addresses from the plurality of accesses; generating a first frequency domain representation of a first plurality of addresses from the captured plurality of addresses, wherein the first plurality of addresses correspond to a first region of the logical address space, and wherein the first frequency domain representation has a first frequency distribution; measuring an amount of randomness in the first frequency distribution by adding together frequency component values above a first cutoff frequency in the first frequency distribution; identifying the first region as a relatively low random region responsive to determining the frequency component values above the first cutoff frequency are less than a first threshold; and identifying the first region as a relatively high random region responsive to determining the frequency component values above the first cutoff frequency are greater than a first threshold; wherein the plurality of accesses target a logical address space. - View Dependent Claims (12, 13, 14)
-
Specification