Techniques for extending grids in data storage systems
First Claim
1. A computer-implemented method, comprising:
- generating, from a first data set using an erasure code, a grid of shards representing the first data set, wherein the grid of shards comprises a set of data shards, a set of null shards, a set of virtual shards, and a set of derived shards, the set of derived shards comprising a first set of columns and a second set of columns, the first set of columns including a set of horizontally derived shards and a set of vertically derived shards, each derived shard of the set of derived shards being respectively reproducible from a subset of the horizontally derived shards in a corresponding row and a subset of the vertically derived shards in a corresponding column, the second set of columns including the set of virtual shards and the set of null shards;
allocating the grid of shards at a set of datacenter locations by at least, on a respective device at each datacenter location of the set of datacenter locations, storing corresponding first shards of the first set of columns and corresponding second shards of the second set of columns;
processing a second data set by at least;
generating, from a first null shard, a second null shard, and the second data set, a respective first converted data shard and a respective second converted data shard, each including at least a portion of the second data set;
generating, by applying the erasure code to the first converted data shard and the second converted data shard, a third converted derived shard; and
generating, by respectively applying the erasure code to the first converted data shard, the second converted data shard, and the third converted derived shard, a respective new first derived shard, a respective new second derived shard, and a respective new third derived shard; and
causing the set of datacenter locations to provide access to both the first data set and the second data set while retaining at least one storage characteristic associated with the first data set, by at least;
adding, to the corresponding first shards stored at the set of datacenter locations, the new first derived shard, the new second derived shard, and the new third derived shard; and
replacing some of the corresponding second shards stored at the set of data center locations with the first converted data shard, the second converted data shard, and the third converted data shard.
1 Assignment
0 Petitions
Accused Products
Abstract
One or more grids of redundancy coded shards, such as those stored or otherwise represented on grid encoded storage systems, are configured to be extensible. For example, a grid of shards may include data shards, derived shards (derived from the data shards), and null shard, indexed by, e.g., row and column. A grid of shards so configured may include data shards and derived shards in one set of columns of the grid, and the null shards in another set of columns of the grid. As additional data is added to the grid, the grid may be extended by converting some of the null shards into data or derived shards, on a row-by-row basis, and regenerating or re-deriving additional shards as necessary.
-
Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
generating, from a first data set using an erasure code, a grid of shards representing the first data set, wherein the grid of shards comprises a set of data shards, a set of null shards, a set of virtual shards, and a set of derived shards, the set of derived shards comprising a first set of columns and a second set of columns, the first set of columns including a set of horizontally derived shards and a set of vertically derived shards, each derived shard of the set of derived shards being respectively reproducible from a subset of the horizontally derived shards in a corresponding row and a subset of the vertically derived shards in a corresponding column, the second set of columns including the set of virtual shards and the set of null shards; allocating the grid of shards at a set of datacenter locations by at least, on a respective device at each datacenter location of the set of datacenter locations, storing corresponding first shards of the first set of columns and corresponding second shards of the second set of columns; processing a second data set by at least; generating, from a first null shard, a second null shard, and the second data set, a respective first converted data shard and a respective second converted data shard, each including at least a portion of the second data set; generating, by applying the erasure code to the first converted data shard and the second converted data shard, a third converted derived shard; and generating, by respectively applying the erasure code to the first converted data shard, the second converted data shard, and the third converted derived shard, a respective new first derived shard, a respective new second derived shard, and a respective new third derived shard; and causing the set of datacenter locations to provide access to both the first data set and the second data set while retaining at least one storage characteristic associated with the first data set, by at least; adding, to the corresponding first shards stored at the set of datacenter locations, the new first derived shard, the new second derived shard, and the new third derived shard; and replacing some of the corresponding second shards stored at the set of data center locations with the first converted data shard, the second converted data shard, and the third converted data shard. - View Dependent Claims (2, 3, 4)
-
-
5. A system, comprising at least one computing device that implements one or more services, wherein the one or more services:
-
generate, by applying a first redundancy code to a first set of data, a grid of shards comprising a first set of columns including a set of data shards and a set of virtual shards, and a second set of columns including a set of null shards and a set of derived shards, wherein each shard of the grid of shards has a corresponding first index and a corresponding second index, each shard other than a null shard or a virtual shard configured such that the shard is reproducible from other shards associated with the first index and the shard is reproducible from other shards associated with the second index; storing, at each datacenter location of a set of datacenter locations associated with the system, first shards of the first set of columns and second shards of the second set of columns; and as a result of a request to store a second set of data, processing the second set of data by at least; generating, from a first null shard, a second null shard, and the second set of data, a respective first converted data shard and a respective second converted data shard, each including at least a portion of the second set of data; generating a third converted derived shard by applying the first redundancy code to the first converted data shard and the second converted data shard; generating, by applying the first redundancy code to the first converted data shard, the second converted data shard, and the third converted derived shard, a respective first derived shard, second derived shard, and third derived shard; and updating the first shards and the second shards to include the first converted data shard, the second converted data shard, the third converted data shard, the first derived shard, the second derived shard, and the third derived shard to enable the set of datacenter locations to provide access to both the first data set and the second data set while retaining at least one storage characteristic associated with the first data set. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage medium having stored thereon executable instructions that, as a result of execution by one or more processors of a computer system, cause the computer system to at least:
-
process a first data object using a redundancy code to generate a grid of shards comprising a set of null shards, a set of virtual shards, and a set of derived shards, each shard of the grid of shards having a corresponding first index and a corresponding second index, each shard being independently reproducible from other shards associated with the first index and other shards associated with the second index; and enable the computer system to retain at least one storage characteristic associated with a first data set while providing access to a second data object via the grid of shards by at least; generating, by applying the redundancy code to the second data object and a first null shard and a second null shard of the set of null shards having the same first index, a respective first data shard and a second data shard, each including at least a portion of the second data object, to replace the first null shard and the second null shard; generating, by applying the redundancy code to the first data shard and a second data shard, a third derived shard to replace a virtual shard having the same first index; and generating a new set of derived shards to replace a subset of the set of derived shards, each shard of the subset corresponding to respective second indices associated with the first data shard, the second data shard, and the third derived shard. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification