System and method for building a point-in-time snapshot of an eventually-consistent data store
First Claim
1. A computer-implemented method for building a point-in-time snapshot of an eventually-consistent data store distributed among a plurality of nodes connected by a network, the method comprising:
- receiving a plurality of inconsistent snapshots, wherein each inconsistent snapshot includes one or more rows of key-value pairs associated with the data store and reflects contents of at least a portion of the data store stored on a particular node of the plurality of nodes; and
generating the point-in-time snapshot by resolving the one or more rows of the key-value pairs to remove any inconsistent values, wherein the point-in-time snapshot includes a subset of the key-value pairs included in the plurality of inconsistent snapshots, wherein generating the point-in-time snapshot comprises;
dividing the one or more rows of the key-value pairs from the plurality of inconsistent snapshots into one or more processing tasks, wherein each processing task includes a different portion of the key-value pairs;
distributing each processing task to one of a plurality of processing nodes configured to perform a reduce operation;
receiving a number of results from the plurality of processing nodes corresponding to a number of distributed processing tasks; and
combining the number of results to generate the point-in-time snapshot.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for building a point-in-time snapshot of an eventually-consistent data store. The data store includes key-value pairs stored on a plurality of storage nodes. In one embodiment, the data store is implemented as an Apache® Cassandra database running in the “cloud.” The data store includes a journaling mechanism that stores journals (i.e., inconsistent snapshots) of the data store on each node at various intervals. In Cassandra, these snapshots are sorted string tables that may be copied to a back-up storage location. A cluster of processing nodes may retrieve and resolve the inconsistent snapshots to generate a point-in-time snapshot of the data store corresponding to a lagging consistency point. In addition, the point-in-time snapshot may be updated as any new inconsistent snapshots are generated by the data store such that the lagging consistency point associated with the updated point-in-time snapshot is more recent.
-
Citations
17 Claims
-
1. A computer-implemented method for building a point-in-time snapshot of an eventually-consistent data store distributed among a plurality of nodes connected by a network, the method comprising:
-
receiving a plurality of inconsistent snapshots, wherein each inconsistent snapshot includes one or more rows of key-value pairs associated with the data store and reflects contents of at least a portion of the data store stored on a particular node of the plurality of nodes; and generating the point-in-time snapshot by resolving the one or more rows of the key-value pairs to remove any inconsistent values, wherein the point-in-time snapshot includes a subset of the key-value pairs included in the plurality of inconsistent snapshots, wherein generating the point-in-time snapshot comprises; dividing the one or more rows of the key-value pairs from the plurality of inconsistent snapshots into one or more processing tasks, wherein each processing task includes a different portion of the key-value pairs; distributing each processing task to one of a plurality of processing nodes configured to perform a reduce operation; receiving a number of results from the plurality of processing nodes corresponding to a number of distributed processing tasks; and combining the number of results to generate the point-in-time snapshot. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for building a point-in-time snapshot of an eventually-consistent data store, comprising:
-
a plurality of slave processors connected by a network and storing the data store; and a master processor connected to the data store via the network, and, when executing a first software application stored in a memory, the master processor is configured to; receive a plurality of inconsistent snapshots, wherein each inconsistent snapshot includes one or more rows of key-value pairs associated with the data store and reflects contents of at least a portion of the data store stored on a first slave processor included in the plurality of slave processors, and generate the point-in-time snapshot by resolving the one or more rows of the key-value pairs to remove any inconsistent values, wherein the point-in-time snapshot includes a subset of the key-value pairs included in the plurality of inconsistent snapshots, wherein generating the point-in-time snapshot comprises; dividing the one or more rows of the key-value pairs from the plurality of inconsistent snapshots into one or more processing tasks, wherein each processing task includes a different portion of the key-value pairs; distributing each processing task to a slave processor included in the plurality of slave processors configured to perform a reduce operation; receiving a number of results from the plurality of slave processors corresponding to a number of distributed processing tasks; and combining the number of results to generate the point-in-time snapshot. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable storage medium including instructions that, when executed by a processing unit, cause the processing unit to perform an operation for building a point-in-time snapshot of an eventually-consistent data store, the operation comprising:
-
receiving a plurality of inconsistent snapshots, wherein each inconsistent snapshot includes one or more rows of key-value pairs associated with the data store and reflects contents of at least a portion of the data store stored on a particular node of the plurality of nodes; generating a sorted table including each row of key-value pairs from the plurality of inconsistent snapshots; and generating the point-in-time snapshot by resolving the one or more rows of the key-value pairs in the sorted table to remove any inconsistent values, wherein the point-in-time snapshot includes a subset of the rows of key-value pairs included in the sorted table such that each unique key is associated with a single row that is selected from all rows in the sorted table associated with that particular key, wherein generating the point-in-time snapshot comprises; dividing the sorted table into one or more processing tasks, wherein each processing task includes a different portion of the one or more rows of the key-value pairs within the sorted table; distributing each processing task to one of a plurality of processing nodes configured to perform a map operation; receiving a number of results from the plurality of processing nodes corresponding to a number of distributed processing tasks; and performing a reduce operation on the number of results to generate the point-in-time snapshot. - View Dependent Claims (16, 17)
-
Specification