Data storage system with trained predictive cache management engine
First Claim
1. A method of training a neural network to evaluate cached datasets, where dataset accesses are logged as dataset entries of a dataset access log, the method comprising:
- designating multiple predetermined event triggers, each trigger comprising a predetermined event occurring in association with any one of the datasets contained in the cache;
in response to the occurrence of an event trigger, the event trigger occurring at a trigger time and in association with a first dataset represented in the dataset access log,consulting the dataset access log to identify a latest access time of the first dataset;
establishing one or more training times in an interval from the trigger time to the latest access time;
for each training time, storing selected training input including characteristics of the first dataset in a training record, the characteristics having been exhibited by the first dataset at the training time and also storing training output including a representation of value provided by having the first dataset present in the cache to satisfy future requests for access to the first dataset; and
according to a predetermined schedule, providing the training input from the training record as input to a single output back propagation neural network yielding a neural network output, and training the neural network according to any difference between the training output and the neural network output.
1 Assignment
0 Petitions
Accused Products
Abstract
In a data storage system, a cache is managed by a predictive cache management engine that evaluates cache contents and purges entries unlikely to receive sufficient future cache hits. The engine includes a single output back propagation neural network that is trained in response to various event triggers. Accesses to stored datasets are logged in a data access log; conversely, log entries are removed according to a predefined expiration criteria. In response to access of a cached dataset or expiration of its log entry, the cache management engine prepares training data. This is achieved by determining characteristics of the dataset at various past times between the time of the access/expiration and a time of last access, and providing these characteristics and the times of access as input to train the neural network. As another part of training, the cache management engine provides the neural network with output representing the expiration or access of the dataset. According to a predefined schedule, the cache management engine operates the trained neural network to generate scores for cached datasets, these scores ranking the datasets relative to each other. According to this or a different schedule, the cache management engine reviews the scores, identifies one or more datasets with the least scores, and purges the identified datasets from the cache.
-
Citations
37 Claims
-
1. A method of training a neural network to evaluate cached datasets, where dataset accesses are logged as dataset entries of a dataset access log, the method comprising:
-
designating multiple predetermined event triggers, each trigger comprising a predetermined event occurring in association with any one of the datasets contained in the cache; in response to the occurrence of an event trigger, the event trigger occurring at a trigger time and in association with a first dataset represented in the dataset access log, consulting the dataset access log to identify a latest access time of the first dataset; establishing one or more training times in an interval from the trigger time to the latest access time; for each training time, storing selected training input including characteristics of the first dataset in a training record, the characteristics having been exhibited by the first dataset at the training time and also storing training output including a representation of value provided by having the first dataset present in the cache to satisfy future requests for access to the first dataset; and according to a predetermined schedule, providing the training input from the training record as input to a single output back propagation neural network yielding a neural network output, and training the neural network according to any difference between the training output and the neural network output. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A cache management method performed in a data storage system including a controller coupled to a storage, a dataset access log, and a cache, the cache storing datasets of the storage according to a use-related criteria, the dataset access log containing dataset entries representing access of datasets in the system, where contents of the log are removed according to a predefined expiration criteria, the method comprising:
-
in response to occurrence of an event trigger occurring at a trigger time, the event trigger comprising access of a cached dataset or expiration of the dataset'"'"'s dataset access log entry, preparing training data by; determining characteristics of the dataset at various times between the trigger time and a time of last access; providing the determined characteristics and the corresponding times as input to train a single output back propagation neural network to provide as desired output scores each representing a desirability of maintaining the dataset in cache as of a different one of the various times; and according to a predefined schedule, operating the trained neural network to generate scores for cached datasets and according to a predefined schedule, reviewing the scores, identifying one or more datasets with lowest scores, and purging the identified datasets from the cache.
-
-
15. A signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus to perform a method for training a neural network to rank datasets contained in a cache, where dataset accesses are logged as dataset entries of a dataset access log, the method comprising:
-
designating multiple predetermined event triggers, each trigger comprising a predetermined event occurring in association with any one of the datasets contained in the cache; in response to the occurrence of an event trigger, the event trigger occurring at a trigger time and in association with a first dataset represented in the dataset access log, consulting the dataset access log to identify a latest access time of the first dataset; establishing one or more training times in an interval from the trigger time to the latest access time; for each training time, storing selected training input including characteristics of the first dataset in a training record, the characteristics having been exhibited by the first dataset at the training time and also storing training output including a representation of value provided by having the first dataset present in the cache to satisfy future requests for access to the first dataset; and according to a predetermined schedule, providing the training input from the training record as input to a single output back propagation neural network yielding a neural network output, and training the neural network according to any difference between the training output and the neural network output. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus to perform a method for cache management method performed in a data storage system including a controller coupled to a storage, a dataset access log, and a cache, the cache storing datasets of the storage according to a use-related criteria, the dataset access log containing dataset entries representing access of datasets in the system, where contents of the log are removed according to a predefined expiration criteria, the method comprising:
-
in response to occurrence of an event trigger occurring at a trigger time, the event trigger comprising access of a cached dataset or expiration of the dataset'"'"'s dataset access log entry, preparing training data by; determining characteristics of the dataset at various times between the trigger time and a time of last access; providing the determined characteristics and the corresponding times as input to train a single output back propagation neural network to provide as desired output scores each representing a desirability of maintaining the dataset in cache as of a different one of the various times; and according to a predefined schedule, operating the trained neural network to generate scores for cached datasets and according to a predefined schedule, reviewing the scores, identifying one or more datasets with lowest scores, and purging the identified datasets from the cache.
-
-
25. A data storage system, comprising:
-
a cache; a data storage; a dataset access log maintaining entries representing accesses of cached datasets; and a cache management engine linked to the data storage, the cache, and the dataset access log, the cache management engine including a single output back propagation neural network, the cache management engine being programmed to perform a method to train the neural network to evaluate cached datasets, the method comprising; designating multiple predetermined event triggers, each trigger comprising a predetermined event occurring in association with any one of the datasets contained in the cache; in response to the occurrence of an event trigger, the event trigger occurring at a trigger time and in association with a first dataset represented in the dataset access log, consulting the dataset access log to identify a latest access time of the first dataset; establishing one or more training times in an interval from the trigger time to the latest access time; for each training time, storing selected training input including characteristics of the first dataset in a training record, the characteristics having been exhibited by the first dataset at the training time and also storing training output including a representation of value provided by having the first dataset present in the cache to satisfy future requests for access to the first dataset; and according to a predetermined schedule, providing the training input from the training record as input to a single output back propagation neural network yielding a neural network output, and training the neural network according to any difference between the training output and the neural network output. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. A data storage system, comprising:
-
a data storage; a cache storing datasets of the storage according to a use-related criteria; a dataset access log maintaining entries representing accesses of datasets contained in the cache, where contents of the log are removed according to a predefined expiration criteria; and a cache management engine linked to the data storage and the cache, the cache management engine including a single output back propagation neural network, the cache management engine being programmed to perform a cache management method comprising; in response to occurrence of an event trigger occurring at a trigger time, the event trigger comprising access of a cached dataset or expiration of the dataset'"'"'s dataset access log entry, preparing training data by; determining characteristics of the dataset at various times between the trigger time and a time of last access; providing the determined characteristics and the corresponding times as input to train a single output back propagation neural network to provide as desired output scores each representing a desirability of maintaining the dataset in cache as of a different one of the various times; and according to a predefined schedule, operating the trained neural network to generate scores for cached datasets; and according to a predefined schedule, reviewing the scores, identifying one or more datasets with lowest scores, and purging the identified datasets from the cache.
-
-
35. A data storage system, comprising:
-
first means for storing machine readable digital data; second means for caching data of the first means; a dataset access log maintaining entries representing accesses of cached datasets; and third means linked to the first and second means, for training a single output back propagation neural network to evaluate datasets contained in the second means by; designating multiple predetermined event triggers, each trigger comprising a predetermined event occurring in association with any one of the datasets contained in the second means; in response to the occurrence of an event trigger, the event trigger occurring at a trigger time and in association with a first dataset represented in the dataset access log, consulting the dataset access log to identify a latest access time of the first dataset; establishing one or more training times in an interval from the trigger time to the latest access time; for each training time, storing selected training input including characteristics of the first dataset in a training record, the characteristics having been exhibited by the first dataset at the training time and also storing training output including a representation of value provided by having the first dataset present in the second means to satisfy future requests for access to the first dataset; and according to a predetermined schedule, providing the training input from the training record as input to the single output back propagation neural network yielding a neural network output, and training the neural network according to any difference between the training output and the neural network output.
-
-
36. A data storage system, comprising:
-
first means for storing machine readable digital data; second means for caching datasets of the first means according to a use-related criteria; a dataset access log maintaining entries representing accesses of datasets contained in the second means, where contents of the log are removed according to a predefined expiration criteria; and third means linked to the first and second means, for managing the second means by; in response to occurrence of an event trigger occurring at a trigger time, the event trigger comprising access of a dataset contained in the second means or expiration of the dataset'"'"'s dataset access log entry, preparing training data by; determining characteristics of the dataset at various times between the trigger time and a time of last access; providing the determined characteristics and the corresponding times as input to train a single output back propagation neural network to provide as desired output scores each representing a desirability of maintaining the dataset in the second means as of a different one of the various times; and according to a predefined schedule, operating the trained neural network to generate scores for datasets contained in the second means; and according to a predefined schedule, reviewing the scores, identifying one or more datasets with lowest scores, and purging the identified datasets from the second means.
-
-
37. A method of grooming datasets contained in a cache, comprising:
-
logging accesses to cached datasets as entries in a cache access log, the entries expiring with prescribed age; detecting occurrence of prescribed events of the following types;
access of a cached dataset and expiration of a most recent cache access log entry corresponding to the dataset;responsive to occurrence of each event, selecting various past times, and for each past time quantifying the desirability of maintaining the cached dataset in cache at that past time in view of the occurrence of the event, and training a neural network utilizing prescribed characteristics of the cached dataset as input and the quantification as a target output for each past time; according to a predetermined schedule, evaluating cached datasets by utilizing the neural network to quantify desirability of maintaining the cached datasets in the cache; and grooming the cache by purging datasets with least desirability of maintaining them in the cache.
-
Specification