Systems and methods for reliably storing data using liquid distributed storage
First Claim
1. A method for accessing source data of a source object stored as multiple fragments distributed across multiple storage nodes of a storage system, wherein one or more fragments of the multiple fragments includes redundant data for the source object, the method comprising:
- receiving the source data as a stream of data;
erasure encoding the stream of data to generate a stream of encoded data as the stream of data is arriving, wherein the redundant data is generated from the source data using an (n ;
k;
r) erasure code;
producing the multiple fragments as a plurality of output fragment streams from the stream of encoded data as the stream of encoded data is being generated;
writing, by an access server having a processor and memory, the multiple fragments to the storage nodes as each of the plurality of output fragment streams as the output fragment streams are being produced, wherein a first portion of each of the output fragment streams corresponds to a first portion of the source object and are written to the storage nodes before a second portion of the source object has been received, and wherein data of at least one fragment of the multiple fragments is stored by the multiple storage nodes using a data organization that concatenates symbols of multiple source blocks from the source object for inclusion of a symbol of each source block in each of two or more fragments of the multiple fragments of the source object;
reading, by the access server, data of a plurality of fragments of the multiple fragments from a plurality of storage nodes of the multiple storage nodes to access a requested portion of the source data, wherein the reading data of a plurality of fragments comprises reading each of at least k fragments of the plurality of output fragments written to storage nodes as an input fragment stream; and
erasure decoding, by the access server, the portion of the source data from the data of the plurality of fragments read from the plurality of storage nodes, wherein the erasure decoding the portion of source data comprises erasure decoding the input fragment stream to generate a stream of source data for the source object as the input fragment stream is being read.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments provide methodologies for reliably storing data within a storage system using liquid distributed storage control. Such liquid distributed storage control operates to compress repair bandwidth utilized within a storage system for data repair processing to the point of operating in a liquid regime. Liquid distributed storage control logic of embodiments may employ a lazy repair policy, repair bandwidth control, a large erasure code, and/or a repair queue. Embodiments of liquid distributed storage control logic may additionally or alternatively implement a data organization adapted to allow the repair policy to avoid handling large objects, instead streaming data into the storage nodes at a very fine granularity.
-
Citations
56 Claims
-
1. A method for accessing source data of a source object stored as multiple fragments distributed across multiple storage nodes of a storage system, wherein one or more fragments of the multiple fragments includes redundant data for the source object, the method comprising:
-
receiving the source data as a stream of data; erasure encoding the stream of data to generate a stream of encoded data as the stream of data is arriving, wherein the redundant data is generated from the source data using an (n ;
k;
r) erasure code;producing the multiple fragments as a plurality of output fragment streams from the stream of encoded data as the stream of encoded data is being generated; writing, by an access server having a processor and memory, the multiple fragments to the storage nodes as each of the plurality of output fragment streams as the output fragment streams are being produced, wherein a first portion of each of the output fragment streams corresponds to a first portion of the source object and are written to the storage nodes before a second portion of the source object has been received, and wherein data of at least one fragment of the multiple fragments is stored by the multiple storage nodes using a data organization that concatenates symbols of multiple source blocks from the source object for inclusion of a symbol of each source block in each of two or more fragments of the multiple fragments of the source object; reading, by the access server, data of a plurality of fragments of the multiple fragments from a plurality of storage nodes of the multiple storage nodes to access a requested portion of the source data, wherein the reading data of a plurality of fragments comprises reading each of at least k fragments of the plurality of output fragments written to storage nodes as an input fragment stream; and erasure decoding, by the access server, the portion of the source data from the data of the plurality of fragments read from the plurality of storage nodes, wherein the erasure decoding the portion of source data comprises erasure decoding the input fragment stream to generate a stream of source data for the source object as the input fragment stream is being read. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. An apparatus for accessing source data of a source object stored as multiple fragments distributed across multiple storage nodes of a storage system, wherein one or more fragments of the multiple fragments includes redundant data for the source object, the apparatus comprising:
-
one or more data processors; and one or more non-transitory computer-readable storage media containing program code configured to cause the one or more data processors to perform operations including; receiving the source data as a stream of data; erasure encoding the stream of data to generate a stream of encoded data as the stream of data is arriving, wherein the redundant data is generated from the source data using an (n;
k;
r) erasure code;producing the multiple fragments as a plurality of output fragment streams from the stream of encoded data as the stream of encoded data is being generated; writing the multiple fragments to the storage nodes as each of the plurality of output fragment streams as the output fragment streams are being produced, wherein a first portion of each of the output fragment streams corresponds to a first portion of the source object and are written to the storage nodes before a second portion of the source object has been received, wherein data of at least one fragment of the multiple fragments is stored by the multiple storage nodes using a data organization that concatenates symbols of multiple source blocks from the source object for inclusion of a symbol of each source block in each of a plurality two or more fragments of the multiple fragments of the source object; reading data of a plurality of fragments of the multiple fragments from a plurality of storage nodes of the multiple storage nodes to access a requested portion of the source data, wherein the reading data of a plurality of fragments comprises reading each of at least k fragments of the plurality of output fragments written to storage nodes as an input fragment stream; and erasure decoding the portion of the source data from the data of the plurality of fragments read from the plurality of storage nodes, wherein the erasure decoding the portion of source data comprises erasure decoding the input fragment stream to generate a stream of source data for the source object as the input fragment stream is being read. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
-
45. An apparatus for accessing source data of a source object stored as multiple fragments distributed across multiple storage nodes of a storage system, wherein one or more fragments of the multiple fragments includes redundant data for the source object, the apparatus comprising:
-
means for receiving the source data as a stream of data; means for erasure encoding the stream of data to generate a stream of encoded data as the stream of data is arriving, wherein the redundant data is generated from the source data using an (n;
k;
r) erasure code;means for producing the multiple fragments as a plurality of output fragment streams from the stream of encoded data as the stream of encoded data is being generated; means for writing the multiple fragments to the storage nodes as each of the plurality of output fragment streams as the output fragment streams are being produced, wherein a first portion of each of the output fragment streams corresponds to a first portion of the source object and are written to the storage nodes before a second portion of the source object has been received, wherein data of at least one fragment of the multiple fragments is stored by the multiple storage nodes using a data organization that concatenates symbols of multiple source blocks from the source object for inclusion of a symbol of each source block in each of two or more fragments of the multiple fragments of the source object; means for reading, by an access server having a processor and memory, data of a plurality of fragments of the multiple fragments from a plurality of storage nodes of the multiple storage nodes to access a requested portion of the source data, wherein the reading data of a plurality of fragments comprises reading each of at least k fragments of the plurality of output fragments written to storage nodes as an input fragment stream; and means for erasure decoding, by the access server, the portion of the source data from the data of the plurality of fragments read from the plurality of storage nodes, wherein the erasure decoding the portion of source data comprises erasure decoding the input fragment stream to generate a stream of source data for the source object as the input fragment stream is being read. - View Dependent Claims (46, 47, 48, 49, 50)
-
-
51. A non-transitory computer-readable medium comprising codes for accessing source data of a source object stored as multiple fragments distributed across multiple storage nodes of a storage system, wherein one or more fragments of the multiple fragments includes redundant data the source object, the codes causing a computer to:
-
receive the source data as a stream of data; erasure encode the stream of data to generate a stream of encoded data as the stream of data is arriving, wherein the redundant data is generated from the source data using an (n;
k;
r) erasure code;produce the multiple fragments as a plurality of output fragment streams from the stream of encoded data as the stream of encoded data is being generated; write the multiple fragments to the storage nodes as each of the plurality of output fragment streams as the output fragment streams are being produced, wherein a first portion of each of the output fragment streams corresponds to a first portion of the source object and are written to the storage nodes before a second portion of the source object has been received, and wherein data of at least one fragment of the multiple fragments is stored by the multiple storage nodes using a data organization that concatenates symbols of multiple source blocks from the source object for inclusion of a symbol of each source block in each of two or more fragments of the multiple fragments of the source object; read data of a plurality of fragments of the multiple fragments from a plurality of storage nodes of the multiple storage nodes to access a requested portion of the source data, wherein reading data of a plurality of fragments comprises reading each of at least k fragments of the plurality of output fragments written to storage nodes as an input fragment stream; and decode the portion of the source data from the data of the plurality of fragments read from the plurality of storage nodes, wherein erasure decoding the portion of source data comprises erasure decoding the input fragment stream to generate a stream of source data for the source object as the input fragment stream is being read. - View Dependent Claims (52, 53, 54, 55, 56)
-
Specification