Incremental out-of-place updates for datasets in data stores
First Claim
1. A system, comprising:
- one or more block-based persistent storage devices, configured to maintain a plurality of data chunks that together comprise a dataset as part of a relational data store, wherein individual ones of the plurality of data chunks correspond to a respective one or more storage locations at the one or more block-based persistent storage devices, wherein the respective storage locations for the plurality of data chunks are linked together according to an ordering schema that specifies a sort order according to a data value of the dataset such that data of the dataset for the relational data store is stored across the respective linked storage locations according to the sort order of the ordering schema for servicing queries;
at least one processor;
a memory, comprising program instructions that when executed cause the at least one processor to implement a storage manager;
the storage manager, configured to;
for individual ones of the plurality of data chunks;
generate an updated version of the data chunk in one or more new storage locations at the block-based persistent storage device that are not included among the respective storage locations for the plurality of data chunks, wherein the generation of the updated version of the data chunk applies at least one data value insertion or deletion to the data chunk such that data of the updated version of the data chunk is stored within the one or more new storage locations according to the sort order of the ordering schema, wherein the plurality of data chunks including the data chunk are available to service queries during the generation of the updated version of the data chunk;
in response to the generation of the updated version of the data chunk;
replace the respective one or more storage locations corresponding to the data chunk with the one or more new storage locations storing the updated version of the data chunk in order to link the one or more new storage locations together in the sort order with remaining ones of the respective storage locations for the plurality of data chunks, wherein subsequent queries are serviced from the plurality of data chunks including the updated version of the data chunk; and
reclaim the respective one or more storage locations for the data chunk to store other data.
1 Assignment
0 Petitions
Accused Products
Abstract
A data store may implement incremental out-of-place updates to a dataset. A dataset may maintain data across different storage locations linked together according to an ordering schema for servicing queries. As updates to the dataset are received, the updates may be persisted but not maintained in-place. In order to update the data store and maintain the ordering schema, incremental updates to the dataset may be performed without blocking queries directed toward the dataset. The dataset may be divided into multiple data chunks that correspond to different storage locations and an updated version of the data chunk may be generated in new storage locations. The new storage locations may then replace the storage locations of the prior version of the data chunk in order to link the new storage locations to the other linked storage locations in the dataset for servicing queries.
34 Citations
20 Claims
-
1. A system, comprising:
-
one or more block-based persistent storage devices, configured to maintain a plurality of data chunks that together comprise a dataset as part of a relational data store, wherein individual ones of the plurality of data chunks correspond to a respective one or more storage locations at the one or more block-based persistent storage devices, wherein the respective storage locations for the plurality of data chunks are linked together according to an ordering schema that specifies a sort order according to a data value of the dataset such that data of the dataset for the relational data store is stored across the respective linked storage locations according to the sort order of the ordering schema for servicing queries; at least one processor; a memory, comprising program instructions that when executed cause the at least one processor to implement a storage manager; the storage manager, configured to; for individual ones of the plurality of data chunks; generate an updated version of the data chunk in one or more new storage locations at the block-based persistent storage device that are not included among the respective storage locations for the plurality of data chunks, wherein the generation of the updated version of the data chunk applies at least one data value insertion or deletion to the data chunk such that data of the updated version of the data chunk is stored within the one or more new storage locations according to the sort order of the ordering schema, wherein the plurality of data chunks including the data chunk are available to service queries during the generation of the updated version of the data chunk; in response to the generation of the updated version of the data chunk; replace the respective one or more storage locations corresponding to the data chunk with the one or more new storage locations storing the updated version of the data chunk in order to link the one or more new storage locations together in the sort order with remaining ones of the respective storage locations for the plurality of data chunks, wherein subsequent queries are serviced from the plurality of data chunks including the updated version of the data chunk; and reclaim the respective one or more storage locations for the data chunk to store other data. - View Dependent Claims (2, 3, 4)
-
-
5. A method, comprising:
-
performing, by one or more computing devices; selecting a data chunk of a plurality of data chunks that together comprise a dataset maintained as part of a data store, wherein individual ones of the plurality of data chunks correspond to a respective one or more storage locations at a block-based persistent storage device, wherein the respective storage locations for the plurality of data chunks are linked together according to an ordering schema that specifies a sort order according to a data value of the dataset such that data of the dataset for the data store is stored across the respective linked storage locations according to the sort order of the ordering schema for servicing queries; generating an updated version of the data chunk in one or more new storage locations at the block-based persistent storage device that are not included among the respective storage locations for the plurality of data chunks, wherein generating the updated version of the data chunk comprises applying at least one data value insertion or deletion to the data chunk such that data of the updated version of the data chunk is stored within the one or more new storage locations according to the sort order of the ordering schema, wherein the plurality of data chunks including the selected data chunk are available to service queries during the generation of the updated version of the data chunk; in response to generating the updated version of the data chunk; replacing the respective one or more storage locations corresponding to the selected data chunk with the one or more new storage locations storing the updated version of the data chunk in order to link the one or more new storage locations together in the sort order with remaining ones of the respective storage locations for the plurality of data chunks, wherein subsequent queries are serviced from the plurality of data chunks including the updated version of the data chunk; and reclaiming the respective one or more storage locations for the selected data chunk to store other data. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A non-transitory, computer-readable storage medium, storing program instructions that when executed by one or more computing devices cause the one or more computing devices to implement:
-
maintaining a plurality of data chunks that together comprise a dataset as part of a data store, wherein individual ones of the plurality of data chunks correspond to a respective one or more storage locations at a block-based persistent storage device, wherein the respective storage locations for the plurality of data chunks are linked together according to an ordering schema that specifies a sort order according to a data value of the dataset such that data of the dataset for the data store is stored across the respective linked storage locations according to the sort order of the ordering schema for servicing queries; for individual ones of the plurality of data chunks; generating an updated version of the data chunk in one or more new storage locations at the block-based persistent storage device that are not included among the respective storage locations for the plurality of data chunks, wherein generating the updated version of the data chunk comprises applying at least one data value insertion or deletion to the data chunk such that data of the updated version of the data chunk is stored within the one or more new storage locations according to the sort order of the ordering schema, wherein the plurality of data chunks including the data chunk are available to service queries during the generation of the updated version of the data chunk; in response to generating the updated version of the data chunk; replacing the respective one or more storage locations corresponding to the data chunk with the one or more new storage locations storing the updated version of the data chunk in order to link the one or more new storage locations together in the sort order with remaining ones of the respective storage locations for the plurality of data chunks, wherein subsequent queries are serviced from the plurality of data chunks including the updated version of the data chunk; and reclaiming the respective one or more storage locations for the data chunk to store other data. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification