PARTITIONING AND REPLICATING DATA IN SCALABLE DISTRIBUTED DATA STORES

US 20170262521A1
Filed: 04/11/2016
Published: 09/14/2017
Est. Priority Date: 03/11/2016
Status: Abandoned Application

First Claim

Patent Images

1. A method, comprising:

generating a first distribution of a set of partitions comprising a graph database across a first set of storage nodes in a first cluster;

replicating the graph database by generating a second distribution of the set of partitions across a second set of storage nodes in a second cluster, wherein the second distribution is different from the first distribution; and

when a query of the graph database is received, processing the query on a computer system by;

identifying one or more partitions storing data associated with the query;

using a set of mappings comprising the set of partitions, the first and second sets of storage nodes, and the first and second clusters to select one or more storage nodes containing the one or more partitions; and

transmitting one or more portions of the query to the selected storage nodes.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosed embodiments provide a system for processing data. During operation, the system generates a first distribution of a set of partitions comprising a graph database across a first set of storage nodes in a first cluster. Next, the system replicates the graph database by generating a second, different distribution of the set of partitions across a second set of storage nodes in a second cluster. The system then identifies one or more partitions storing data associated with the query and uses a set of mappings comprising the set of partitions, the first and second sets of storage nodes, and the first and second clusters to select one or more storage nodes containing the one or more partitions. Finally, the system transmits one or more portions of the query to the selected storage nodes.

27 Citations

View as Search Results

20 Claims

1. A method, comprising:
- generating a first distribution of a set of partitions comprising a graph database across a first set of storage nodes in a first cluster;
  
  replicating the graph database by generating a second distribution of the set of partitions across a second set of storage nodes in a second cluster, wherein the second distribution is different from the first distribution; and
  
  when a query of the graph database is received, processing the query on a computer system by;
  
  identifying one or more partitions storing data associated with the query;
  
  using a set of mappings comprising the set of partitions, the first and second sets of storage nodes, and the first and second clusters to select one or more storage nodes containing the one or more partitions; and
  
  transmitting one or more portions of the query to the selected storage nodes.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, further comprising:
    - for each key associated with the graph database, storing data associated with the key in a single partition of the set of partitions.
  - 3. The method of claim 2, wherein storing the set of data associated with the key in the single partition comprises:
    - identifying the single partition from a hash of the key; and
      
      storing the data in the identified partition.
  - 4. The method of claim 2, wherein the key comprises a node in a graph stored in the graph database and the data comprises a set of edges associated with the node.
  - 5. The method of claim 1, wherein the set of mappings comprises a mapping from a cluster to one or more storage nodes in the cluster.
  - 6. The method of claim 1, wherein the set of mappings comprises a mapping from a partition to one or more storage nodes storing the partition.
  - 7. The method of claim 1, wherein the set of mappings comprises a mapping from a storage node to one or more partitions stored in the storage node.
  - 8. The method of claim 1, wherein generating the first and second distributions of the partitions comprises:
    - for each storage node in a cluster, randomly selecting a subset of the partitions for storing on the storage node; and
      
      updating the set of mappings based on the randomly selected subset of the partitions.
  - 9. The method of claim 1, wherein using the set of mappings to select the one or more storage nodes containing the one or more partitions comprises:
    - using a round-robin technique to select the one or more storage nodes for processing of the query.
  - 10. The method of claim 1, wherein using the set of mappings to select the one or more storage nodes containing the one or more partitions comprises:
    - selecting a fan-out of the query to the one or more storage nodes based on a query type of the query.
  - 11. The method of claim 1, wherein transmitting one or more portions of the query to the selected storage nodes comprises:
    - transmitting, in a single request to a selected storage node, multiple portions of the query associated with the selected storage node.

12. An apparatus, comprising:
- one or more processors; and
  
  memory storing instructions that, when executed by the one or more processors, cause the apparatus to;
  
  generate a first distribution of a set of partitions comprising a graph database across a first set of storage nodes in a first cluster;
  
  replicate the graph database by generating a second distribution of the set of partitions across a second set of storage nodes in a second cluster, wherein the second distribution is different from the first distribution; and
  
  when a query of the graph database is received, process the query by;
  
  identifying one or more partitions storing data associated with the query;
  
  using a set of mappings comprising the set of partitions, the first and second sets of storage nodes, and the first and second clusters to select one or more storage nodes containing the one or more partitions; and
  
  transmitting one or more portions of the query to the selected storage nodes.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The apparatus of claim 12, wherein the memory further stores instructions that, when executed by the one or more processors, cause the apparatus to:
    - for each key associated with the graph database, store data associated with the key in a single partition from the set of partitions.
  - 14. The apparatus of claim 13, wherein the key comprises a node in a graph stored in the graph database and the data comprises a set of edges associated with the node.
  - 15. The apparatus of claim 12, wherein the set of mappings comprises:
    - a first mapping from a cluster to one or more storage nodes in the cluster;
      
      a second mapping from a partition to one or more storage nodes storing the partition; and
      
      a third mapping from a storage node to one or more partitions stored in the storage node.
  - 16. The apparatus of claim 12, wherein generating the first and second distributions of the partitions comprises:
    - for each storage node in a cluster, randomly selecting a subset of the partitions for storing on the storage node; and
      
      updating the set of mappings based on the randomly selected subset of the partitions.
  - 17. The apparatus of claim 12, wherein using the set of mappings to select the one or more storage nodes containing the one or more partitions comprises:
    - using a round-robin technique to select the one or more storage nodes for processing of the query.

18. A system, comprising:
- a distribution mechanism comprising a non-transitory computer-readable medium comprising instructions that, when executed, cause the system to;
  
  generate a first distribution of a set of partitions comprising a graph database across a first set of storage nodes in a first cluster; and
  
  replicate the graph database by generating a second distribution of the set of partitions across a second set of storage nodes in a second cluster, wherein the second distribution is different from the first distribution; and
  
  a query processor comprising a non-transitory computer-readable medium comprising instructions that, when executed, cause the system to process a query of the graph database by;
  
  identifying one or more partitions storing data associated with the query;
  
  using a set of mappings comprising the set of partitions, the first and second sets of storage nodes, and the first and second clusters to select one or more storage nodes containing the one or more partitions; and
  
  transmitting one or more portions of the query to the selected storage nodes.
- View Dependent Claims (19, 20)
- - 19. The system of claim 18, wherein the set of mappings comprises:
    - a first mapping from a cluster to one or more storage nodes in the cluster;
      
      a second mapping from a partition to one or more storage nodes storing the partition; and
      
      a third mapping from a storage node to one or more partitions stored in the storage node.
  - 20. The system of claim 18, wherein generating the first and second distributions of the partitions comprises:
    - for each storage node in a cluster, randomly selecting a subset of the partitions for storing on the storage node; and
      
      updating the set of mappings based on the randomly selected subset of the partitions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
LinkedIn Corporation (Microsoft Corporation)
Inventors
Cho, SungJu, Carter, Andrew J., Ehrlich, Joshua D., Jan, Jane Alam

Application Number

US15/096,068
Publication Number

US 20170262521A1
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/2471   Distributed queries

G06F 16/27   Replication, distribution o...

G06F 16/278   Data partitioning, e.g. hor...

G06F 16/285   Clustering or classification

G06F 16/9024   Graphs; Linked lists G06F16...

G06F 16/9535   Search customisation based ...

G06F 16/9538   Presentation of query results

G06F 9/505   considering the load

H04L 45/46   Cluster building

PARTITIONING AND REPLICATING DATA IN SCALABLE DISTRIBUTED DATA STORES

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

27 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

PARTITIONING AND REPLICATING DATA IN SCALABLE DISTRIBUTED DATA STORES

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

27 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links