Erasure coding immutable data
First Claim
1. A method in a distributed computing environment for erasure coding data, the method comprising:
- identifying a data stream stored in the distributed computing environment comprised of at least one sealed read-only extent (sealed extent), wherein the sealed extent is comprised of two or more data blocks and two or more index blocks;
optimizing, with a processor in the distributed computing environment, the sealed extent, wherein optimizing is comprised of;
(1) grouping the two or more data blocks within the optimized sealed extent together and(2) grouping the two or more index blocks within the optimized sealed extent together;
erasure coding the optimized sealed extent, wherein erasure coding is comprised of;
(1) creating a predefined number of data fragments, wherein the predefined number of data fragments are created by dividing data represented by the two or more data blocks of the optimized sealed extent into the predefined number of data fragments and(2) creating a predefined number of coding fragments, wherein the coding fragments are created based, at least in part, on the data fragments; and
storing the predefined number of data fragments and the predefined number of coding fragments in the distributed computing environment.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the present invention relate to systems, methods and computer storage media for erasure coding data in a distributed computing environment. A sealed extent is identified that is comprised of two or more data blocks and two or more index blocks. The sealed extent is optimized for erasure coding by grouping the two or more data blocks within the optimized sealed extent together and grouping the two or more index blocks within the optimized sealed extent together. The optimized extent may also be erasure coded, which includes creating data fragments and coding fragments. The data fragments and the coding fragments may also be stored in the distributed computing environment. Additional embodiments include monitoring statistical information to determine if replication, erasure coding or a hybrid storage plan should be utilized.
290 Citations
20 Claims
-
1. A method in a distributed computing environment for erasure coding data, the method comprising:
-
identifying a data stream stored in the distributed computing environment comprised of at least one sealed read-only extent (sealed extent), wherein the sealed extent is comprised of two or more data blocks and two or more index blocks; optimizing, with a processor in the distributed computing environment, the sealed extent, wherein optimizing is comprised of; (1) grouping the two or more data blocks within the optimized sealed extent together and (2) grouping the two or more index blocks within the optimized sealed extent together; erasure coding the optimized sealed extent, wherein erasure coding is comprised of; (1) creating a predefined number of data fragments, wherein the predefined number of data fragments are created by dividing data represented by the two or more data blocks of the optimized sealed extent into the predefined number of data fragments and (2) creating a predefined number of coding fragments, wherein the coding fragments are created based, at least in part, on the data fragments; and storing the predefined number of data fragments and the predefined number of coding fragments in the distributed computing environment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. One or more computer storage media having computer-executable instructions embodied thereon, that when executed by a distributed computing environment having a processor and memory, cause the distributed computing environment to perform a method, the method comprising:
-
monitoring a demand for a data stream stored in the distributed computing environment, wherein the data stream is comprised of at least one sealed extent, the sealed extent is comprised of two or more data blocks and two or more index blocks; determining the demand for the data stream is above or below a predefined demand threshold; as a result of the demand for the data stream being above or below the predefined demand threshold, erasure coding the sealed extent such that one or more data fragments are created and one or more coding fragments are created; and storing each of the one or more data fragments and each of the one or more coding fragments in one or more nodes of the distributed computing environment. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A system for erasure coding an immutable sealed extent in a distributed computing environment, the system comprising:
-
a cluster manager, the cluster manager performs a method, the method comprising; (1) identifying, based on statistical information interpreted by the cluster manager, a read-only sealed extent to which erasure coding is to be applied, wherein the read-only sealed extent is comprised of two or more data blocks and two or more index blocks; (2) optimizing the read-only sealed extent to result in an optimized extent, wherein optimizing is comprised of; a) grouping the two or more data blocks within the optimized extent together and b) grouping the two or more index blocks within the optimized extent together; and (3) erasure coding the optimized extent, wherein erasure coding is comprised of; a) creating a predefined number of data fragments, wherein the predefined number of data fragments are created by apportioning data represented by the two or more data blocks of the optimized extent into the predefined number of data fragments and b) creating a predefined number of coding fragments, wherein the coding fragments are created based, at least in part, on the data fragments; and a plurality of storage nodes that are utilized to store the predefined number of data fragments and the predefined number of coding fragments, wherein the number of the plurality of storage nodes is equal to or greater than the predefined number of data fragments.
-
Specification