STREAMING RESTORE OF A DATABASE FROM A BACKUP SYSTEM
First Claim
1. A method, comprising:
- performing, by one or more computers;
storing columnar data of a database table in a plurality of physical data blocks in a distributed data storage system on behalf of one or more clients, wherein the distributed data storage system comprises a cluster of one or more nodes, each of which comprises one or more disks on which physical data blocks are stored, and wherein each of the plurality of physical data blocks is associated with a respective unique identifier;
storing a copy of each of the plurality of physical data blocks in a remote key-value durable backup storage system, wherein for each of the plurality of physical data blocks, the respective unique identifier serves as a key to access the data block in the remote key-value durable backup storage system;
detecting a failure in the distributed data storage system affecting at least one of the plurality of physical data blocks in which the columnar data was stored;
in response to said detecting, automatically initiating a restore of the columnar data that was stored in the at least one of the plurality of physical data blocks from the remote key-value durable backup storage system; and
prior to restoring all of the columnar data that was stored in the at least one of the plurality of physical data blocks;
receiving one or more query requests directed to the columnar data of the database table; and
accepting and servicing the one or more query requests, wherein said servicing comprises obtaining at least some of the columnar data of the database table to which the one or more query requests are directed from the remote key-value durable backup storage system using the respective unique identifiers as keys to access data blocks in the remote key-value durable backup storage system comprising the at least some of the columnar data.
1 Assignment
0 Petitions
Accused Products
Abstract
A distributed data warehouse system may maintain data blocks on behalf of clients in multiple clusters in a data store. Each cluster may include a single leader node and multiple compute nodes, each including multiple disks storing data. The warehouse system may store primary and secondary copies of each data block on different disks or nodes in a cluster. Each node may include a data structure that maintains metadata about each data block stored on the node, including its unique identifier. The warehouse system may back up data blocks in a remote key-value backup storage system with high durability. A streaming restore operation may be used to retrieve data blocks from backup storage using their unique identifiers as keys. The warehouse system may service incoming queries (and may satisfy some queries by retrieving data from backup storage on an as-needed basis) prior to completion of the restore operation.
106 Citations
22 Claims
-
1. A method, comprising:
performing, by one or more computers; storing columnar data of a database table in a plurality of physical data blocks in a distributed data storage system on behalf of one or more clients, wherein the distributed data storage system comprises a cluster of one or more nodes, each of which comprises one or more disks on which physical data blocks are stored, and wherein each of the plurality of physical data blocks is associated with a respective unique identifier; storing a copy of each of the plurality of physical data blocks in a remote key-value durable backup storage system, wherein for each of the plurality of physical data blocks, the respective unique identifier serves as a key to access the data block in the remote key-value durable backup storage system; detecting a failure in the distributed data storage system affecting at least one of the plurality of physical data blocks in which the columnar data was stored; in response to said detecting, automatically initiating a restore of the columnar data that was stored in the at least one of the plurality of physical data blocks from the remote key-value durable backup storage system; and prior to restoring all of the columnar data that was stored in the at least one of the plurality of physical data blocks; receiving one or more query requests directed to the columnar data of the database table; and accepting and servicing the one or more query requests, wherein said servicing comprises obtaining at least some of the columnar data of the database table to which the one or more query requests are directed from the remote key-value durable backup storage system using the respective unique identifiers as keys to access data blocks in the remote key-value durable backup storage system comprising the at least some of the columnar data. - View Dependent Claims (2, 3, 4, 5)
-
6. A method, comprising:
performing, by one or more computers; maintaining data in one or more physical data blocks of a data storage system on behalf of one or more clients, wherein each physical data block is associated with a unique identifier; performing a backup operation to store a respective copy of data stored in a given physical data block in a key-value storage system that is distinct from the data storage system; subsequent to storing the respective copy of the data stored in the given physical data block, restoring the data stored in the given physical data block from the key-value storage system to the data storage system while accepting and servicing queries directed to the data maintained on behalf of the one or more clients, wherein said restoring comprises accessing the respective copy of data in the key-value storage system using the unique identifier associated with the given physical data block as a key in the key-value storage system. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
-
14. A non-transitory computer-readable storage medium storing program instructions that when executed on one or more computers cause the one or more computers to perform:
-
storing one or more data blocks in a data warehouse system; creating a respective entry for each of the one or more data blocks in a data structure that stores information about data blocks stored in the data warehouse system, wherein each of the respective entries for the one or more data blocks comprises a unique identifier for the data block and an indication that the data block has not yet been backed up; performing a backup operation for a plurality of data blocks stored in the data warehouse system, including the one or more data blocks, wherein said performing comprises; storing a backup copy of the data structure in a remote key-value storage system; storing, for each data block stored in the data warehouse system for which a corresponding entry in the data structure indicates that it has not yet been backed up, a backup copy of the data block in the remote key-value storage system; and updating the entries in the data structure corresponding to each data block that was backed up by the backup operation to indicate that the data block has been backed up. - View Dependent Claims (15, 16, 17)
-
-
18. A computing system, comprising:
-
one or more computing nodes, each of which comprises at least one processor and a memory, wherein the one or more computing nodes are configured to collectively implement a database service; and an interface to a remote key-value storage system; wherein the database service is configured to maintain data on behalf of one or more subscribers to the database service; wherein the one or more computing nodes are configured to store the data maintained on behalf of the one or more subscribers in a plurality of physical data blocks on one or more storage devices, wherein each of the plurality of physical data blocks is associated with a unique identifier; wherein the database service is configured to perform a backup operation for the data maintained on behalf of the one or more subscribers, wherein to perform the backup operation, the database service is configured to send to the remote key-value storage system, via the interface, a copy of each of the plurality of physical data blocks for storage in the remote key-value storage system, and the unique identifiers associated with each of the plurality of physical data blocks to be used as access keys for copies of the plurality of physical data blocks in the remote key-value storage system. - View Dependent Claims (19, 20, 21, 22)
-
Specification