Method and system for efficiently replicating data in non-relational databases

US 8,554,724 B2
Filed: 08/17/2012
Issued: 10/08/2013
Est. Priority Date: 02/09/2010
Status: Active Grant

First Claim

Patent Images

1. A method of compacting a distributed database having a plurality of instances, wherein each instance stores data on one or more server computers and each server computer has memory and one or more processors, the method comprising:

identifying a first instance of the distributed database from the plurality of instances;

selecting a set of one or more row identifiers that identify rows in the distributed database, wherein each respective row in the distributed database has a respective base value and a respective set of zero or more respective deltas, and wherein each respective delta;

specifies a change to the respective base value;

includes a respective sequence identifier that specifies an order in which the respective delta is applied to the respective base value to compute a current value for the respective row; and

specifies a respective instance where the respective delta was created;

identifying a plurality of other instances of the distributed database, wherein the plurality of other instances are selected from the plurality of instances and each of the other instances is distinct from the first instance;

selecting a compaction horizon for the selected set of one or more row identifiers, wherein the compaction horizon is a sequence identifier, and wherein the compaction horizon satisfies;

all deltas that(i) were created at the first instance,(ii) are for rows corresponding to row identifiers in the selected set of one or more row identifiers, and(iii) have sequence identifiers less than or equal to the compaction horizon,have been transmitted to and acknowledged by all of the other instances that maintain data for the corresponding row identifiers; and

all deltas that(i) were created at instances in the plurality of other instances,(ii) are for rows corresponding to row identifiers in the selected set of one or more row identifiers, and(iii) have sequence identifiers less than or equal to the compaction horizon,have been received at the first instance; and

for each respective row identifier in at least a subset of the selected one or more row identifiers;

identifying a non-empty set of deltas for the respective row identifier, wherein each of the deltas in the identified non-empty set has a sequence identifier less than or equal to the compaction horizon;

applying, in sequence, each of the deltas in the identified non-empty set, to the respective base value for the respective row identifier, thereby updating the respective base value with the changes specified by the deltas in the identified set; and

deleting the deltas in the identified non-empty set.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method replicates data between instances of a distributed database. The method identifies at least two instances of the database at distinct geographic locations. The method tracks changes to the database by storing deltas. Each delta has a row identifier that identifies the piece of data modified, a sequence identifier that specifies the order in which the deltas are applied to the data, and an instance identifier that specifies where the delta was created. The method determines which deltas to send using an egress map that specifies which combinations of row identifier and sequence identifier have been acknowledged as received at other instances. The method builds a transmission matrix that identifies deltas that have not yet been acknowledged as received. The method then transmits deltas identified in the transmission matrix. After receiving acknowledgement that transmitted deltas have been incorporated into databases at other instances, the method updates the egress map.

Citations

15 Claims

1. A method of compacting a distributed database having a plurality of instances, wherein each instance stores data on one or more server computers and each server computer has memory and one or more processors, the method comprising:
- identifying a first instance of the distributed database from the plurality of instances;
  
  selecting a set of one or more row identifiers that identify rows in the distributed database, wherein each respective row in the distributed database has a respective base value and a respective set of zero or more respective deltas, and wherein each respective delta;
  
  specifies a change to the respective base value;
  
  includes a respective sequence identifier that specifies an order in which the respective delta is applied to the respective base value to compute a current value for the respective row; and
  
  specifies a respective instance where the respective delta was created;
  
  identifying a plurality of other instances of the distributed database, wherein the plurality of other instances are selected from the plurality of instances and each of the other instances is distinct from the first instance;
  
  selecting a compaction horizon for the selected set of one or more row identifiers, wherein the compaction horizon is a sequence identifier, and wherein the compaction horizon satisfies;
  
  all deltas that(i) were created at the first instance,(ii) are for rows corresponding to row identifiers in the selected set of one or more row identifiers, and(iii) have sequence identifiers less than or equal to the compaction horizon,have been transmitted to and acknowledged by all of the other instances that maintain data for the corresponding row identifiers; and
  
  all deltas that(i) were created at instances in the plurality of other instances,(ii) are for rows corresponding to row identifiers in the selected set of one or more row identifiers, and(iii) have sequence identifiers less than or equal to the compaction horizon,have been received at the first instance; and
  
  for each respective row identifier in at least a subset of the selected one or more row identifiers;
  
  identifying a non-empty set of deltas for the respective row identifier, wherein each of the deltas in the identified non-empty set has a sequence identifier less than or equal to the compaction horizon;
  
  applying, in sequence, each of the deltas in the identified non-empty set, to the respective base value for the respective row identifier, thereby updating the respective base value with the changes specified by the deltas in the identified set; and
  
  deleting the deltas in the identified non-empty set.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein each sequence identifier comprises a timestamp and a unique tie breaker value that is assigned based on hardware and/or software at each instance of the distributed database.
  - 3. The method of claim 1, wherein the plurality of instances comprises instances at two or more distinct geographic locations.
  - 4. The method of claim 1, wherein the selected set of one or more row identifiers includes a first row identifier whose respective set of zero or more respective deltas contains at least one delta.
  - 5. The method of claim 4, wherein the selected set of one or more row identifiers includes a second row identifier whose respective set of zero or more respective deltas is empty.

6. A distributed database system having a plurality of instances, wherein each instance comprises a plurality of servers and each server has:
- one or more processors;
  
  memory; and
  
  one or more programs stored in the memory for execution by the one or more processors, the one or more programs comprising instructions for;
  
  identifying a first instance of a distributed database from the plurality of instances; and
  
  at the first instance;
  
  selecting a set of one or more row identifiers that identify rows in the distributed database, wherein each respective row in the distributed database has a respective base value and a respective set of zero or more respective deltas, and wherein each respective delta;
  
  specifies a change to the respective base value;
  
  includes a respective sequence identifier that specifies an order in which the respective delta is applied to the respective base value to compute a current value for the respective row; and
  
  specifies a respective instance where the respective delta was created;
  
  identifying a plurality of other instances of the distributed database, wherein the plurality of other instances are selected from the plurality of instances and each of the other instances is distinct from the first instance;
  
  selecting a compaction horizon for the selected set of one or more row identifiers, wherein the compaction horizon is a sequence identifier, and wherein the compaction horizon satisfies;
  
  all deltas that(i) were created at the first instance,(ii) are for rows corresponding to row identifiers in the selected set of one or more row identifiers, and(iii) have sequence identifiers less than or equal to the compaction horizon,have been transmitted to and acknowledged by all of the other instances that maintain data for the corresponding row identifiers; and
  
  all deltas that(i) were created at instances in the plurality of other instances,(ii) are for rows corresponding to row identifiers in the selected set of one or more row identifiers, and(iii) have sequence identifiers less than or equal to the compaction horizon,have been received at the first instance; and
  
  for each respective row identifier in at least a subset of the selected one or more row identifiers;
  
  identifying a non-empty set of deltas for the respective row identifier, wherein each of the deltas in the identified non-empty set has a sequence identifier less than or equal to the compaction horizon;
  
  applying, in sequence, each of the deltas in the identified non-empty set, to the respective base value for the respective row identifier, thereby updating the respective base value with the changes specified by the deltas in the identified set; and
  
  deleting the deltas in the identified non-empty set.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The distributed database system of claim 6, wherein each sequence identifier comprises a timestamp and a unique tie breaker value that is assigned based on hardware and/or software at each instance of the distributed database.
  - 8. The distributed database system of claim 6, wherein the plurality of instances comprises instances at two or more distinct geographic locations.
  - 9. The distributed database system of claim 6, wherein the selected set of one or more row identifiers includes a first row identifier whose respective set of zero or more respective deltas contains at least one delta.
  - 10. The distributed database system of claim 9, wherein the selected set of one or more row identifiers includes a second row identifier whose respective set of zero or more respective deltas is empty.

11. A computer readable storage medium storing one or more programs configured for execution by a distributed database system having a plurality of instances, wherein each instance comprises one or more server computers and each server computer has one or more processors and memory, the one or more programs including instructions to:
- identify a first instance of a distributed database from the plurality of instances;
  
  select a set of one or more row identifiers that identify rows in the distributed database, wherein each respective row in the distributed database has a respective base value and a respective set of zero or more respective deltas, and wherein each respective delta;
  
  specifies a change to the respective base value;
  
  includes a respective sequence identifier that specifies an order in which the respective delta is applied to the respective base value to compute a current value for the respective row; and
  
  specifies a respective instance where the respective delta was created;
  
  identify a plurality of other instances of the distributed database, wherein the plurality of other instances are selected from the plurality of instances and each of the other instances is distinct from the first instance;
  
  select a compaction horizon for the selected set of one or more row identifiers, wherein the compaction horizon is a sequence identifier, and wherein the compaction horizon satisfies;
  
  all deltas that(i) were created at the first instance,(ii) are for rows corresponding to row identifiers in the selected set of one or more row identifiers, and(iii) have sequence identifiers less than or equal to the compaction horizon,have been transmitted to and acknowledged by all of the other instances that maintain data for the corresponding row identifiers; and
  
  all deltas that(i) were created at instances in the plurality of other instances,(ii) are for rows corresponding to row identifiers in the selected set of one or more row identifiers, and(iii) have sequence identifiers less than or equal to the compaction horizon,have been received at the first instance; and
  
  for each respective row identifier in at least a subset of the selected one or more row identifiers;
  
  identify a non-empty set of deltas for the respective row identifier, wherein each of the deltas in the identified non-empty set has a sequence identifier less than or equal to the compaction horizon;
  
  apply, in sequence, each of the deltas in the identified non-empty set, to the respective base value for the respective row identifier, thereby updating the respective base value with the changes specified by the deltas in the identified set; and
  
  delete the deltas in the identified non-empty set.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The computer readable storage medium of claim 11, wherein each sequence identifier comprises a timestamp and a unique tie breaker value that is assigned based on hardware and/or software at each instance of the distributed database.
  - 13. The computer readable storage medium of claim 11, wherein the plurality of instances comprises instances at two or more distinct geographic locations.
  - 14. The computer readable storage medium of claim 11, wherein the selected set of one or more row identifiers includes a first row identifier whose respective set of zero or more respective deltas contains at least one delta.
  - 15. The computer readable storage medium of claim 14, wherein the selected set of one or more row identifiers includes a second row identifier whose respective set of zero or more respective deltas is empty.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Zunger, Yonatan
Primary Examiner(s)
Pham, Hung Q

Application Number

US13/588,993
Publication Number

US 20120310903A1
Time in Patent Office

417 Days
Field of Search

None
US Class Current

707/609
CPC Class Codes

G06F 16/2329   using versioning

G06F 16/27   Replication, distribution o...

G06F 16/275   Synchronous replication

Method and system for efficiently replicating data in non-relational databases

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for efficiently replicating data in non-relational databases

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links