Dynamic scheduling of distributed storage management tasks using predicted system characteristics
First Claim
1. A method for scheduling storage management tasks over predicted user tasks in a distributed storage system, the method comprising:
- receiving a set of historical stimulus records, comprising system task data records that characterize one or more system tasks that have been executed on the distributed storage system that comprises at least a first node and a second node, wherein the distributed storage system comprising a plurality of storage devices of a cluster, wherein any node distributed across the cluster of nodes that has a controller virtual machine utilizes its respective controller virtual machine to read and write to content on the plurality of storage devices in a storage pool;
receiving, a set of historical response records comprising one or more system metrics associated with execution of the system tasks on the first node of the distributed storage system of the cluster, wherein a user task executed at the first node is observable at different nodes within the cluster, the one or more system metrics comprising a first portion that corresponds to measured metrics at the first node and a second portion that corresponds to results measured at a second node that are produced by executing the user task on the first node;
generating a prediction model for the distributed storage system of the cluster based on a learning model formed from at least two stimulus records of the set of historical stimulus records and at least two response records of the set of historical response records;
generating a set of forecasted user tasks predicted to be executed on the distributed storage system of cluster;
applying the set of forecasted user tasks as new stimulus records to the prediction model to determine a set of forecasted system metrics for the distributed storage system of cluster, the set of forecasted system metrics being predicted to result from running the set of forecasted user tasks on the distributed storage system of cluster the distributed storage system of cluster; and
selecting one or more distributed storage management tasks to be scheduled for execution on certain nodes of the distributed storage system of cluster based at least in part on a comparison between management task parameters and the set of forecasted system metrics, wherein the certain nodes of the distributed storage system are identified as being relevant to the one or more distributed storage management tasks.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for scheduling storage management tasks over predicted user tasks in a distributed storage system. A method commences upon receiving a set of historical stimulus records that characterize management tasks that are run in the storage system. A corresponding set of historical response records comprising system metrics associated with execution of the system tasks is also received. A learning model is formed from the stimulus records and the response records and formatted to be used as a predictor. A set of forecasted user tasks is input as new stimulus records to the predictor to determine a set of forecasted system metrics that would result from running the forecasted user tasks. Management tasks are selected so as not to impact the forecasted user tasks. Management tasks can be selected based on non-contentions resource usage between historical management task resource usage and predictions of resource usage by the user tasks.
71 Citations
20 Claims
-
1. A method for scheduling storage management tasks over predicted user tasks in a distributed storage system, the method comprising:
-
receiving a set of historical stimulus records, comprising system task data records that characterize one or more system tasks that have been executed on the distributed storage system that comprises at least a first node and a second node, wherein the distributed storage system comprising a plurality of storage devices of a cluster, wherein any node distributed across the cluster of nodes that has a controller virtual machine utilizes its respective controller virtual machine to read and write to content on the plurality of storage devices in a storage pool; receiving, a set of historical response records comprising one or more system metrics associated with execution of the system tasks on the first node of the distributed storage system of the cluster, wherein a user task executed at the first node is observable at different nodes within the cluster, the one or more system metrics comprising a first portion that corresponds to measured metrics at the first node and a second portion that corresponds to results measured at a second node that are produced by executing the user task on the first node; generating a prediction model for the distributed storage system of the cluster based on a learning model formed from at least two stimulus records of the set of historical stimulus records and at least two response records of the set of historical response records;
generating a set of forecasted user tasks predicted to be executed on the distributed storage system of cluster;applying the set of forecasted user tasks as new stimulus records to the prediction model to determine a set of forecasted system metrics for the distributed storage system of cluster, the set of forecasted system metrics being predicted to result from running the set of forecasted user tasks on the distributed storage system of cluster the distributed storage system of cluster; and selecting one or more distributed storage management tasks to be scheduled for execution on certain nodes of the distributed storage system of cluster based at least in part on a comparison between management task parameters and the set of forecasted system metrics, wherein the certain nodes of the distributed storage system are identified as being relevant to the one or more distributed storage management tasks. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer readable medium, embodied in a non-transitory computer readable medium, the non-transitory computer readable medium having stored thereon a sequence of instructions which, when stored in memory and executed by a processor causes the processor to perform a set of acts for scheduling storage management tasks over predicted user tasks in a distributed storage system, the set of acts comprising:
-
receiving a set of historical stimulus records, comprising system task data records that characterize one or more system tasks that have been executed on the distributed storage system that comprises at least a first node and a second node, wherein the distributed storage system comprising a plurality of storage devices of a duster, wherein any node distributed across the cluster of nodes that has a controller virtual machine utilizes its respective controller virtual machine to read and write to content on the plurality of storage devices in a storage pool; receiving, a set of historical response records comprising one or more system metrics associated with execution of the system tasks on the first node of the distributed storage system of the cluster, wherein a user task executed at the first node is observable at different nodes within the cluster, the one or more system metrics comprising a first portion that corresponds to measured metrics at the first node and a second portion that corresponds to results measured at a second node that are produced by executing the user task on the first node;
generating a prediction model for the distributed storage system of the cluster based on a learning model formed from at least two stimulus records of the set of historical stimulus records and at least two response records of the set of historical response records;generating a set of forecasted user tasks predicted to be executed on the distributed storage system of cluster; applying the set of forecasted user tasks as new stimulus records to the prediction model to determine a set of forecasted system metrics for the distributed storage system of cluster, the set of forecasted system metrics being predicted to result from running the set of forecasted user tasks on the distributed storage system of cluster the distributed storage system of cluster; and selecting one or more distributed storage management tasks to be scheduled for execution on certain nodes of the distributed storage system of cluster based at least in part on a comparison between management task parameters and the set of forecasted system metrics, wherein the certain nodes of the distributed storage system are identified as being relevant to the one or more distributed storage management tasks. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. A system for scheduling storage management tasks over predicted user tasks in a distributed storage system comprising:
- a storage medium having stored thereon a sequence of instructions; and
a processor or processors that execute the sequence of instructions to cause the processor or processors to perform a set of acts, the set of acts comprising;receiving a set of historical stimulus records, comprising system task data records that characterize one or more system tasks that have been executed on the distributed storage system that comprises at least a first node and a second node, wherein the distributed storage system comprising a plurality of storage devices of a cluster, wherein any node distributed across the cluster of nodes that has a controller virtual machine utilizes its respective controller virtual machine to read and write to content on the plurality of storage devices in a storage pool; receiving, a set of historical response records comprising one or more system metrics associated with execution of the system tasks on the first node of the distributed storage system of the cluster, wherein a user task executed at the first node is observable at different nodes within the cluster, the one or more system metrics comprising a first portion that corresponds to measured metrics at the first node and a second portion that corresponds to results measured at a second node that are produced by executing the user task on the first node; generating a prediction model for the distributed storage system of the cluster based on a learning model formed from at least two stimulus records of the set of historical stimulus records and at least two response records of the set of historical response records;
generating a set of forecasted user tasks predicted to be executed on the distributed storage system of cluster;applying the set of forecasted user tasks as new stimulus records to the prediction model to determine a set of forecasted system metrics for the distributed storage system of cluster, the set of forecasted system metrics being predicted to result from running the set of forecasted user tasks on the distributed storage system of cluster the distributed storage system of cluster; and selecting one or more distributed storage management tasks to be scheduled for execution on certain nodes of the distributed storage system of duster based at least in part on a comparison between management task parameters and the set of forecasted system metrics, wherein the certain nodes of the distributed storage system are identified as being relevant to the one or more distributed storage management tasks. - View Dependent Claims (19, 20)
- a storage medium having stored thereon a sequence of instructions; and
Specification