Managing distributed system performance using accelerated data retrieval operations
First Claim
1. A method for managing performance of a distributed storage system, the method comprising:
- storing, in a plurality of storage devices of the distributed storage system, a plurality of stripes associated with a data item, the plurality of stripes generated according to a coding scheme, wherein the coding scheme generates a number of stripes associated with the data item that is more than a minimum number of stripes needed to reconstruct the data item, and wherein the plurality of stripes includes redundancy information for the data item;
performing a distributed process including a task that requires retrieval of the data item from the distributed storage system; and
responsive to determining that a processing speed associated with the task does not meet a threshold, performing an accelerated data retrieval operation by;
requesting more than the minimum number of stripes needed to reconstruct the data item from at least two of the plurality of storage devices of the distributed storage system;
determining whether at least the minimum number of stripes required to reconstruct the data item has been received; and
responsive to a determination that at least the minimum number of stripes required to reconstruct the data item has been received, reconstructing the data item using redundancy information.
9 Assignments
0 Petitions
Accused Products
Abstract
A distributed system is adapted to manage the performance of distributed processes. In one aspect, multiple stripes associated with a data item are stored in a distributed storage. The stored stripes include one or more stripes of redundancy information for the data item. A distributed process including at least one task is performed. During performance of the distributed process, a determination is made as to whether to perform an accelerated data retrieval operation. Responsive to a determination to perform an accelerated data retrieval operation, at least one of the one or more stripes of redundancy information for the data item is requested from the distributed storage. Other stripes associated with the data item may also be requested from the distributed storage. After a sufficient subset of stripes associated with the data item is received, the data item is reconstructed using the subset.
-
Citations
20 Claims
-
1. A method for managing performance of a distributed storage system, the method comprising:
-
storing, in a plurality of storage devices of the distributed storage system, a plurality of stripes associated with a data item, the plurality of stripes generated according to a coding scheme, wherein the coding scheme generates a number of stripes associated with the data item that is more than a minimum number of stripes needed to reconstruct the data item, and wherein the plurality of stripes includes redundancy information for the data item; performing a distributed process including a task that requires retrieval of the data item from the distributed storage system; and responsive to determining that a processing speed associated with the task does not meet a threshold, performing an accelerated data retrieval operation by; requesting more than the minimum number of stripes needed to reconstruct the data item from at least two of the plurality of storage devices of the distributed storage system; determining whether at least the minimum number of stripes required to reconstruct the data item has been received; and responsive to a determination that at least the minimum number of stripes required to reconstruct the data item has been received, reconstructing the data item using redundancy information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A non-transitory computer readable storage medium executing computer program instructions for managing performance of a distributed storage system, the computer program instructions comprising instructions for:
-
storing, in a plurality of storage devices of the distributed storage system, a plurality of stripes associated with a data item, the plurality of stripes generated according to a coding scheme, wherein the coding scheme generates a number of stripes associated with the data item that is more than a minimum number of stripes needed to reconstruct the data item, and wherein the plurality of stripes includes redundancy information for the data item; performing a distributed process including a task that requires retrieval of the data item from the distributed storage system; and responsive to determining that a processing speed associated with the task does not meet a threshold, performing an accelerated data retrieval operation by; requesting more than the minimum number of stripes needed to reconstruct the data item from at least two of the plurality of storage devices of the distributed storage system; determining whether at least the minimum number of stripes required to reconstruct the data item has been received; and responsive to a determination that at least the minimum number of stripes required to reconstruct the data item has been received, reconstructing the data item using redundancy information. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system comprising:
a computer readable storage medium storing processor-executable computer program instructions for managing performance of a distributed storage system, the instructions comprising instructions for; storing, in a plurality of storage devices of the distributed storage system, a plurality of stripes associated with a data item, the plurality of stripes generated according to a coding scheme, wherein the coding scheme generates a number of stripes associated with the data item that is more than a minimum number of stripes needed to reconstruct the data item, and wherein the plurality of stripes includes redundancy information for the data item; performing a distributed process including a task that requires retrieval of the data item from the distributed storage system; and responsive to determining that a processing speed associated with the task does not meet a threshold, performing an accelerated data retrieval operation by; requesting more than the minimum number of stripes needed to reconstruct the data item from at least two of the plurality of storage devices of the distributed storage system; determining whether at least the minimum number of stripes required to reconstruct the data item has been received; and responsive to a determination that at least the minimum number of stripes required to reconstruct the data item has been received, reconstructing the data item using redundancy information. - View Dependent Claims (20)
Specification