METHOD AND SYSTEM FOR EFFICIENTLY REPLICATING DATA IN NON-RELATIONAL DATABASES

US 20110196827A1
Filed: 02/09/2010
Published: 08/11/2011
Est. Priority Date: 02/09/2010
Status: Active Grant

First Claim

Patent Images

1. A method of replicating data for a distributed database between a plurality of instances, each instance comprising one or more server computers with memory and one or more processors, the method comprising:

identifying a first instance of the distributed database at a first geographic location;

identifying a second instance of the distributed database at a second geographic location;

tracking changes to the distributed database at the first instance by storing deltas, each delta having a row identifier that identifies a piece of data modified, a sequence identifier that specifies an order in which the deltas are applied to the data, and an instance identifier that specifies an instance where the delta was created;

determining which deltas are to be sent to the second instance using a second egress map at the first instance, wherein the second egress map specifies which combinations of row identifier and sequence identifier have been acknowledged as received at the second instance;

building a second transmission matrix for the second instance that identifies deltas that have not yet been acknowledged as received at the second instance;

transmitting deltas identified in the second transmission matrix to the second instance;

receiving acknowledgement that transmitted deltas have been incorporated in the second instance; and

updating the second egress map to indicate acknowledged deltas.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method replicates data between instances of a distributed database. The method identifies at least two instances of the database at distinct geographic locations. The method tracks changes to the database by storing deltas. Each delta has a row identifier that identifies the piece of data modified, a sequence identifier that specifies the order in which the deltas are applied to the data, and an instance identifier that specifies where the delta was created. The method determines which deltas to send using an egress map that specifies which combinations of row identifier and sequence identifier have been acknowledged as received at other instances. The method builds a transmission matrix that identifies deltas that have not yet been acknowledged as received. The method then transmits deltas identified in the transmission matrix. After receiving acknowledgement that transmitted deltas have been incorporated into databases at other instances, the method updates the egress map.

Citations

23 Claims

1. A method of replicating data for a distributed database between a plurality of instances, each instance comprising one or more server computers with memory and one or more processors, the method comprising:
- identifying a first instance of the distributed database at a first geographic location;
  
  identifying a second instance of the distributed database at a second geographic location;
  
  tracking changes to the distributed database at the first instance by storing deltas, each delta having a row identifier that identifies a piece of data modified, a sequence identifier that specifies an order in which the deltas are applied to the data, and an instance identifier that specifies an instance where the delta was created;
  
  determining which deltas are to be sent to the second instance using a second egress map at the first instance, wherein the second egress map specifies which combinations of row identifier and sequence identifier have been acknowledged as received at the second instance;
  
  building a second transmission matrix for the second instance that identifies deltas that have not yet been acknowledged as received at the second instance;
  
  transmitting deltas identified in the second transmission matrix to the second instance;
  
  receiving acknowledgement that transmitted deltas have been incorporated in the second instance; and
  
  updating the second egress map to indicate acknowledged deltas.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising:
    - identifying a third instance of the distributed database at a third geographic location distinct from the first and second geographic locations;
      
      determining which deltas are to be sent to the third instance using a third egress map at the first instance, wherein the third egress map specifies which combinations of row identifier and sequence identifier have been acknowledged as received at the third instance;
      
      building a third transmission matrix for the third instance that identifies deltas that have not yet been acknowledged as received at the third instance;
      
      modifying the transmission matrices for the second and third instances to form one or more revised transmission matrices, wherein deltas identified in each revised transmission matrix are transmitted to a respective location to update the instance at the respective location, and deltas identified in at least one of the revised transmission matrices are transmitted to the second location for subsequent transmission from the second location to the third location;
      
      receiving acknowledgement that deltas transmitted to the third instance, either directly or indirectly via the second instance, have been incorporated in the third instance; and
      
      updating the third egress map to indicate acknowledged deltas.
  - 3. The method of claim 2, including assigning a cost for transmissions between pairs of geographic locations, and wherein modifying the transmission matrices includes an analysis of the total cost for transmitting the deltas to the second and third geographic locations.
  - 4. The method of claim 2, wherein modifying the transmission matrices includes determining bandwidth availability between geographic locations of the instances.
  - 5. The method of claim 2, wherein the transmission matrices for the second and third instances are the same, there is only one revised transmission matrix, the one revised transmission matrix is the same as the transmission matrices, and deltas identified in the revised transmission matrix are transmitted to the second geographic location for subsequent transmission to the third geographic location.
  - 6. The method of claim 1, wherein each sequence identifier comprises a timestamp and a unique tie breaker value that is assigned based on hardware and/or software at each instance of the distributed database.
  - 7. The method of claim 1, wherein the second geographic location is distinct from the first geographic location

8. A method of compacting a distributed database having a plurality of instances, wherein each instance stores data on one or more server computers and each server computer has memory and one or more processors, the method comprising:
- identifying a first instance of the distributed database;
  
  selecting a set of one or more row identifiers that identify rows of data in the distributed database, wherein each row in the distributed database has a base value and a set of zero or more deltas, and wherein each delta specifies a change to the base value, includes a sequence identifier that specifies an order in which the deltas are to be applied to the base value, and specifies an instance where the delta was created;
  
  selecting a compaction horizon for the selected set of one or more row identifiers, wherein the compaction horizon is a sequence identifier;
  
  applying, in sequence, all deltas for the selected set of one or more row identifiers that have sequence identifiers less than or equal to the compaction horizon, to the base value for the corresponding row identifier; and
  
  deleting the deltas that have been applied to the base value for the corresponding row identifier.
- View Dependent Claims (9, 10)
- - 9. The method of claim 8, further comprising:
    - identifying a plurality of other instances of the distributed database;
      
      wherein the selected compaction horizon for the selected set of one or more row identifiers satisfies;
      
      all deltas that(i) were created at the first instance,(ii) are for rows corresponding to row identifiers in the selected set of one or more row identifiers, and(iii) have sequence identifiers less than or equal to the compaction horizon,have been transmitted to and acknowledged by all of the other instances that maintain data for the corresponding row identifiers; and
      
      all deltas that(i) were created at instances in the plurality of other instances,(ii) are for rows corresponding to row identifiers in the selected set of one or more row identifiers, and(iii) have sequence identifiers less than or equal to the compaction horizon,have been received at the first instance.
  - 10. The method of claim 8, wherein each sequence identifier comprises a timestamp and a unique tie breaker value that is assigned based on hardware and/or software at each instance of the distributed database.

11. A method of reading a data item from a distributed database with a plurality of data items, each data item comprising a base value and zero or more deltas that specify modifications to the base value, the method performed by one or more server computers having memory and one or more processors, the method comprising:
- receiving a request from a client for a specified data item, the request including a row identifier that identifies the data item;
  
  reading the base value for the specified data item from the distributed database and storing the base value in memory;
  
  reading the deltas for the specified data item, if any, from the distributed database, wherein each delta includes a sequence identifier that specifies an order in which the deltas are to be applied to the base value;
  
  applying the deltas to the base value in memory, in sequence, resulting in a current base value stored in memory; and
  
  returning the current base value stored in memory to the client.

12. A server system, comprising a plurality of servers, each server having:
- one or more processors;
  
  memory; and
  
  one or more programs stored in the memory for execution by the one or more processors, the one or more programs comprising instructions for;
  
  identifying a first instance of the distributed database at a first geographic location;
  
  identifying a second instance of the distributed database at a second geographic location;
  
  tracking changes to the distributed database at the first instance by storing deltas, each delta having a row identifier that identifies a piece of data modified, a sequence identifier that specifies an order in which the deltas are applied to the data, and an instance identifier that specifies an instance where the delta was created;
  
  determining which deltas are to be sent to the second instance using a second egress map at the first instance, wherein the second egress map specifies which combinations of row identifier and sequence identifier have been acknowledged as received at the second instance;
  
  building a second transmission matrix for the second instance that identifies deltas that have not yet been acknowledged as received at the second instance;
  
  transmitting deltas identified in the second transmission matrix to the second instance;
  
  receiving acknowledgement that transmitted deltas have been incorporated in the second instance; and
  
  updating the second egress map to indicate acknowledged deltas.
- View Dependent Claims (13, 14)
- - 13. The server system of claim 12, further comprising instructions for:
    - identifying a third instance of the distributed database at a third geographic location distinct from the first and second geographic locations;
      
      determining which deltas are to be sent to the third instance using a third egress map at the first instance, wherein the third egress map specifies which combinations of row identifier and sequence identifier have been acknowledged as received at the third instance;
      
      building a third transmission matrix for the third instance that identifies deltas that have not yet been acknowledged as received at the third instance;
      
      modifying the transmission matrices for the second and third instances to form one or more revised transmission matrices, wherein deltas identified in each revised transmission matrix are transmitted to a respective location to update the instance at the respective location, and deltas identified in at least one of the revised transmission matrices are transmitted to the second location for subsequent transmission from the second location to the third location;
      
      receiving acknowledgement that deltas transmitted to the third instance, either directly or indirectly via the second instance, have been incorporated in the third instance; and
      
      updating the third egress map to indicate acknowledged deltas.
  - 14. The server system of claim 13, wherein the transmission matrices for the second and third instances are the same, there is only one revised transmission matrix, the one revised transmission matrix is the same as the transmission matrices, and deltas identified in the revised transmission matrix are transmitted to the second geographic location for subsequent transmission to the third geographic location.

15. A server system, comprising a plurality of servers, each server having:
- one or more processors;
  
  memory; and
  
  one or more programs stored in the memory for execution by the one or more processors, the one or more programs comprising instructions for;
  
  identifying a first instance of a distributed database;
  
  selecting a set of one or more row identifiers that identify rows of data in the distributed database, wherein each row in the distributed database has a base value and a set of zero or more deltas, and wherein each delta specifies a change to the base value, includes a sequence identifier that specifies an order in which the deltas are to be applied to the base value, and specifies an instance where the delta was created;
  
  selecting a compaction horizon for the selected set of one or more row identifiers, wherein the compaction horizon is a sequence identifier;
  
  applying, in sequence, all deltas for the selected set of one or more row identifiers that have sequence identifiers less than or equal to the compaction horizon, to the base value for the corresponding row identifier; and
  
  deleting the deltas that have been applied to the base value for the corresponding row identifier.
- View Dependent Claims (16)
- - 16. The server system of claim 15, further comprising instructions for:
    - identifying a plurality of other instances of the distributed database;
      
      wherein the selected compaction horizon for the selected set of one or more row identifiers satisfies;
      
      all deltas that(i) were created at the first instance,(ii) are for rows corresponding to row identifiers in the selected set of one or more row identifiers, and(iii) have sequence identifiers less than or equal to the compaction horizon,have been transmitted to and acknowledged by all of the other instances that maintain data for the corresponding row identifiers; and
      
      all deltas that(i) were created at instances in the plurality of other instances,(ii) are for rows corresponding to row identifiers in the selected set of one or more row identifiers, and(iii) have sequence identifiers less than or equal to the compaction horizon,have been received at the first instance.

17. A server system, comprising a plurality of servers, each server having:
- one or more processors;
  
  memory; and
  
  one or more programs stored in the memory for execution by the one or more processors, the one or more programs comprising instructions for;
  
  receiving a request from a client for a specified data item from a distributed database with a plurality of data items, each data item comprising a base value and zero or more deltas that specify modifications to the base value, wherein the request includes a row identifier that identifies the data item;
  
  reading the base value for the specified data item from the distributed database and storing the base value in memory;
  
  reading the deltas for the specified data item, if any, from the distributed database, wherein each delta includes a sequence identifier that specifies an order in which the deltas are to be applied to the base value;
  
  applying the deltas to the base value in memory, in sequence, resulting in a current base value stored in memory; and
  
  returning the current base value stored in memory to the client.

18. A computer readable storage medium storing one or more programs configured for execution by a server computer system having one or more processors and memory storing one or more programs for execution by the one or more processors, the one or more programs comprising instructions to:
- identify a first instance of the distributed database at a first geographic location;
  
  identify a second instance of the distributed database at a second geographic location;
  
  track changes to the distributed database at the first instance by storing deltas, each delta having a row identifier that identifies a piece of data modified, a sequence identifier that specifies an order in which the deltas are applied to the data, and an instance identifier that specifies an instance where the delta was created;
  
  determine which deltas are to be sent to the second instance using a second egress map at the first instance, wherein the second egress map specifies which combinations of row identifier and sequence identifier have been acknowledged as received at the second instance;
  
  build a second transmission matrix for the second instance that identifies deltas that have not yet been acknowledged as received at the second instance;
  
  transmit deltas identified in the second transmission matrix to the second instance;
  
  receive acknowledgement that transmitted deltas have been incorporated in the second instance; and
  
  update the second egress map to indicate acknowledged deltas.
- View Dependent Claims (19, 20)
- - 19. The computer readable storage medium of claim 18, further comprising instructions to:
    - identify a third instance of the distributed database at a third geographic location distinct from the first and second geographic locations;
      
      determine which deltas are to be sent to the third instance using a third egress map at the first instance, wherein the third egress map specifies which combinations of row identifier and sequence identifier have been acknowledged as received at the third instance;
      
      build a third transmission matrix for the third instance that identifies deltas that have not yet been acknowledged as received at the third instance;
      
      modify the transmission matrices for the second and third instances to form one or more revised transmission matrices, wherein deltas identified in each revised transmission matrix are transmitted to a respective location to update the instance at the respective location, and deltas identified in at least one of the revised transmission matrices are transmitted to the second location for subsequent transmission from the second location to the third location;
      
      receive acknowledgement that deltas transmitted to the third instance, either directly or indirectly via the second instance, have been incorporated in the third instance; and
      
      update the third egress map to indicate acknowledged deltas.
  - 20. The computer readable storage medium of claim 19, wherein the transmission matrices for the second and third instances are the same, there is only one revised transmission matrix, the one revised transmission matrix is the same as the transmission matrices, and deltas identified in the revised transmission matrix are transmitted to the second geographic location for subsequent transmission to the third geographic location.

21. A computer readable storage medium storing one or more programs configured for execution by a server computer system having one or more processors and memory storing one or more programs for execution by the one or more processors, the one or more programs comprising instructions to:
- identify a first instance of a distributed database;
  
  select a set of one or more row identifiers that identify rows of data in the distributed database, wherein each row in the distributed database has a base value and a set of zero or more deltas, and wherein each delta specifies a change to the base value, includes a sequence identifier that specifies an order in which the deltas are to be applied to the base value, and specifies an instance where the delta was created;
  
  select a compaction horizon for the selected set of one or more row identifiers, wherein the compaction horizon is a sequence identifier;
  
  apply, in sequence, all deltas for the selected set of one or more row identifiers that have sequence identifiers less than or equal to the compaction horizon, to the base value for the corresponding row identifier; and
  
  delete the deltas that have been applied to the base value for the corresponding row identifier.
- View Dependent Claims (22)
- - 22. The computer readable storage medium of claim 21, further comprising instructions to:
    - identify a plurality of other instances of the distributed database;
      
      wherein the selected compaction horizon for the selected set of one or more row identifiers satisfies;
      
      all deltas that(i) were created at the first instance,(ii) are for rows corresponding to row identifiers in the selected set of one or more row identifiers, and(iii) have sequence identifiers less than or equal to the compaction horizon,have been transmitted to and acknowledged by all of the other instances that maintain data for the corresponding row identifiers; and
      
      all deltas that(i) were created at instances in the plurality of other instances,(ii) are for rows corresponding to row identifiers in the selected set of one or more row identifiers, and(iii) have sequence identifiers less than or equal to the compaction horizon,have been received at the first instance.

23. A computer readable storage medium storing one or more programs configured for execution by a server computer system having one or more processors and memory storing one or more programs for execution by the one or more processors, the one or more programs comprising instructions to:
- receive a request from a client for a specified data item from a distributed database with a plurality of data items, each data item comprising a base value and zero or more deltas that specify modifications to the base value, wherein the request includes a row identifier that identifies the data item;
  
  read the base value for the specified data item from the distributed database and storing the base value in memory;
  
  read the deltas for the specified data item, if any, from the distributed database, wherein each delta includes a sequence identifier that specifies an order in which the deltas are to be applied to the base value;
  
  apply the deltas to the base value in memory, in sequence, resulting in a current base value stored in memory; and
  
  return the current base value stored in memory to the client.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
ZUNGER, YONATAN

Granted Patent

US 8,380,659 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/622
CPC Class Codes

G06F 16/2329   using versioning

G06F 16/27   Replication, distribution o...

G06F 16/275   Synchronous replication

METHOD AND SYSTEM FOR EFFICIENTLY REPLICATING DATA IN NON-RELATIONAL DATABASES

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND SYSTEM FOR EFFICIENTLY REPLICATING DATA IN NON-RELATIONAL DATABASES

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links