Prioritized repair of data storage failures
First Claim
1. A method for managing data storage over a network using a network computer that executes instructions that perform actions, comprising:
- when one or more repair events are associated with one or more new storage failures on a storage unit or a repair symbol unit, generating one or more new repair tasks that are associated with the one or more new storage failures;
determining a resource budget based on a network bandwidth capacity for one or more different portions of the network, wherein the resource budget includes separate values for each of the different portions of the network;
determining a protection level that represents a maximum number of storage unit or repair symbol unit failures before particular data is irrevocably lost on the data storage;
determining a risk of irrevocable loss for each repair task;
promoting the one or more repair tasks to be one or more new active repair tasks when a priority value for the one or more repair tasks is higher than a priority value for one or more active repair tasks and enough of the resource budget is available to execute the one or more new active repair tasks when each active repair task is executing; and
executing each active repair task to repair the one or more storage failures that are associated with the one or more active repair tasks.
3 Assignments
0 Petitions
Accused Products
Abstract
Embodiments are directed towards managing data storage that may experience a data failure. If a repair event is associated with a data storage failure, a new repair task may be generated and added to a task list. A priority value for each repair task in the task list may be determined based in part on the mean-time-to-data-loss (MTTDL) value associated with each repair task in the task list such that a lower MTTDL may indicate a higher priority value over a lower MTTDL. One or more repair tasks may be promoted to become active repair tasks based on the priority value the repair tasks such that the promoted repair tasks have a higher priority that than other repair tasks in the task list, if any. Each active repair task may be executed to repair one or more associated the storage failures.
-
Citations
26 Claims
-
1. A method for managing data storage over a network using a network computer that executes instructions that perform actions, comprising:
-
when one or more repair events are associated with one or more new storage failures on a storage unit or a repair symbol unit, generating one or more new repair tasks that are associated with the one or more new storage failures; determining a resource budget based on a network bandwidth capacity for one or more different portions of the network, wherein the resource budget includes separate values for each of the different portions of the network; determining a protection level that represents a maximum number of storage unit or repair symbol unit failures before particular data is irrevocably lost on the data storage; determining a risk of irrevocable loss for each repair task; promoting the one or more repair tasks to be one or more new active repair tasks when a priority value for the one or more repair tasks is higher than a priority value for one or more active repair tasks and enough of the resource budget is available to execute the one or more new active repair tasks when each active repair task is executing; and executing each active repair task to repair the one or more storage failures that are associated with the one or more active repair tasks. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system that is arranged for managing data storage over a network, comprising:
-
a network computer comprising; a transceiver that is operative to communicate over the network; a memory that is operative to store at least instructions; and a processor device that is operative to execute instructions that enable actions, including; when one or more repair events are associated with one or more new storage failures on a storage unit or a repair symbol unit, generating one or more new repair tasks that are associated with the one or more new storage failures; determining a resource budget based on a network bandwidth capacity for one or more different portions of the network, wherein the resource budget includes separate values for each of the different portions of the network; determining a protection level that represents a maximum number of storage unit or repair symbol unit failures before particular data is irrevocably lost on the data storage; determining a risk of irrevocable loss for each repair task; promoting the one or more repair tasks to be one or more new active repair tasks when a priority value for the one or more repair tasks is higher than a priority value for one or more active repair tasks and enough of the resource budget is available to execute the one or more new active repair tasks when each active repair task is executing; and executing each active repair task to repair the one or more storage failures that are associated with the one or more active repair tasks; and a client computer, comprising; a transceiver that is operative to communicate over the network; a memory that is operative to store at least instructions; and a processor device that is operative to execute instructions that enable actions, including; providing configuration information to the network computer. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A processor readable non-transitory storage media that includes instructions for managing data storage over a network, wherein execution of the instructions by a processor device enables actions, comprising:
when one or more repair events are associated with one or more new storage failures on a storage unit or a repair symbol unit, generating one or more new repair tasks that are associated with the one or more new storage failures; determining a resource budget based on a network bandwidth capacity for one or more different portions of the network, wherein the resource budget includes separate values for each of the different portions of the network; determining a protection level that represents a maximum number of storage unit or repair symbol unit failures before particular data is irrevocably lost on the data storage; determining a risk of irrevocable loss for each repair task; promoting the one or more repair tasks to be one or more new active repair tasks when a priority value for the one or more repair tasks is higher than a priority value for one or more active repair tasks and enough of the resource budget is available to execute the one or more new active repair tasks when each active repair task is executing; and executing each active repair task to repair the one or more storage failures that are associated with the one or more active repair tasks. - View Dependent Claims (16, 17, 18, 19, 20)
-
21. A network computer that is operative for managing data storage over a network, comprising:
-
a transceiver that is operative to communicate over a network; a memory that is operative to store at least instructions; and a processor device that is operative to execute instructions that enable actions, including; when one or more repair events are associated with one or more new storage failures on a storage unit or a repair symbol unit, generating one or more new repair tasks that are associated with the one or more new storage failures; determining a resource budget based on a network bandwidth capacity for one or more different portions of the network, wherein the resource budget includes separate values for each of the different portions of the network; determining a protection level that represents a maximum number of storage unit or repair symbol unit failures before particular data is irrevocably lost on the data storage; determining a risk of irrevocable loss for each repair task; promoting the one or more repair tasks to be one or more new active repair tasks when a priority value for the one or more repair tasks is higher than a priority value for one or more active repair tasks and enough of the resource budget is available to execute the one or more new active repair tasks when each active repair task is executing; and executing each active repair task to repair the one or more storage failures that are associated with the one or more active repair tasks. - View Dependent Claims (22, 23, 24, 25, 26)
-
Specification