Efficient failure recovery in a distributed data storage system
First Claim
Patent Images
1. A method of recovering information in a distributed storage system, comprising:
- maintaining a list of entries corresponding to values for storing on a first storage device in the distributed storage system;
defining a range of wait time intervals between convergence rounds on the list of entries, wherein each convergence round attempts to converge values corresponding to the entries to an At Maximum Redundancy (AMR) state;
performing a first convergence round by a first processing device on the list of entries; and
scheduling a second convergence round for the first processing device to perform on the list of entries by selecting a wait time interval from the defined range of wait time intervals.
2 Assignments
0 Petitions
Accused Products
Abstract
A method is provided for efficiently recovering information in a distributed storage system where a list of values that should be stored on a storage device is maintained. A first convergence round is scheduled to be performed on the list of values to bring each value to an At Maximum Redundancy (AMR) state. A second convergence round is scheduled to be performed on the list by selecting a wait time interval from a predefined range of wait time intervals between starts of convergence rounds.
-
Citations
20 Claims
-
1. A method of recovering information in a distributed storage system, comprising:
-
maintaining a list of entries corresponding to values for storing on a first storage device in the distributed storage system; defining a range of wait time intervals between convergence rounds on the list of entries, wherein each convergence round attempts to converge values corresponding to the entries to an At Maximum Redundancy (AMR) state; performing a first convergence round by a first processing device on the list of entries; and scheduling a second convergence round for the first processing device to perform on the list of entries by selecting a wait time interval from the defined range of wait time intervals. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A distributed storage system, comprising:
-
a local storage domain; and a remote storage domain in communication with the local storage domain via a network, each of the local and remote storage domains comprising; a fragment server having a plurality of storage devices to store encoded fragments corresponding to a plurality of values inserted into the distributed storage system, the fragment server configured to; perform a first round of convergence on values having fragments for storing on the fragment server'"'"'s storage devices; and schedule a second round of convergence on values that did not achieve an At Maximum Redundancy (AMR) state in the first round of convergence, wherein the fragment server is configured to schedule the second round of convergence to start after a selected time interval from start of the first round of convergence. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A distributed storage system, comprising:
-
a first fragment server having a plurality of first storage devices to store encoded fragments corresponding to values inserted into the distributed storage system; and a second fragment server having a plurality of second storage devices to store encoded fragments corresponding to values inserted into the distributed storage system, the second fragment server in communication with the first fragment server via a network, wherein each of the first fragment server and the second fragment server is configured to execute respective convergence steps on a first value having encoded fragments assigned for storage on the first storage devices and the second storage devices, and wherein the second fragment server is configured to delay execution of its respective convergence step on the first value in response to notification that the first fragment server is executing its respective convergence step on the first value. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification