SCALABLE DISTRIBUTED PROCESSING OF RDF DATA

US 20140108414A1
Filed: 10/12/2012
Published: 04/17/2014
Est. Priority Date: 10/12/2012
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, with a database system, a query for a Resource Description Framework (RDF) database that stores a plurality of data chunks to one or more storage drives, wherein each of the plurality of data chunks includes a plurality of triples of the RDF database;

accessing an index that indexes one or more of the data chunks to identify a subset of the data chunks relevant to the query;

loading the subset of the data chunks to a main memory associated with the database system; and

executing the query only against triples included within the subset of the data chunks loaded to the main memory to obtain a query result.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In general, techniques are described for an RDF (Resource Description Framework) database system which can scale to huge size for realistic data sets of practical interest. In some examples, a database system includes a Resource Description Framework (RDF) database that stores a plurality of data chunks to one or more storage drives, wherein each of the plurality of data chunks includes a plurality of triples of the RDF database. The database system also includes a working memory, a query interface that receives a query for the RDF database, a SPARQL engine that identifies a subset of the data chunks relevant to the query, and an index interface that includes one or more bulk loaders that load the subset of the data chunks to the working memory. The SPARQL engine executes the query only against triples included within the loaded subset of the data chunks to obtain a query result.

121 Citations

View as Search Results

21 Claims

1. A method comprising:
- receiving, with a database system, a query for a Resource Description Framework (RDF) database that stores a plurality of data chunks to one or more storage drives, wherein each of the plurality of data chunks includes a plurality of triples of the RDF database;
  
  accessing an index that indexes one or more of the data chunks to identify a subset of the data chunks relevant to the query;
  
  loading the subset of the data chunks to a main memory associated with the database system; and
  
  executing the query only against triples included within the subset of the data chunks loaded to the main memory to obtain a query result.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18)
- - 2. The method of claim 1,wherein a first one of the plurality of data chunks includes a first plurality of triples of the RDF database,wherein a second one of the plurality of data chunks includes a second plurality of triples of the RDF database,wherein the first plurality of triples comprises a first RDF graph,wherein the second plurality of triples comprises a second RDF graph,wherein loading the subset of the data chunks to a main memory comprises merging the first RDF graph and the second RDF graph to generate a combined RDF graph in the main memory, andwherein executing the query comprises executing the query only against the combined RDF graph.
  - 3. The method of claim 1, wherein the triples included within the subset of the data chunks loaded to the main memory comprise a labeled directed graph.
  - 4. The method of claim 1,wherein the query comprises a graph pattern, andwherein the query result comprises a subgraph of the labeled directed graph that matches the graph pattern.
  - 5. The method of claim 1, wherein at least one of the triples included within the subset of the data chunks loaded to the main memory does not match any part of the query.
  - 6. The method of claim 1, wherein a first one of the plurality of the data chunks and a second one of the plurality of the data chunks include a common one of the plurality of triples.
  - 7. The method of claim 1, wherein the index is distributed across multiple hosts.
  - 8. The method of claim 1, wherein the index comprises one or more key-value pairs that each comprises a key and a value that references a corresponding one of the plurality of data chunks.
  - 9. The method of claim 8, wherein the key for each of the key-value pairs is derived from one or more triples included within the corresponding one of the plurality of data chunks.
  - 10. The method of claim 1,wherein the index comprises a first index, wherein the first index comprises keys defined by a first characteristic of the subset of the data chunks,wherein a second index comprises keys defined by a second characteristic of the subset of the data chunks,wherein identifying a subset of the data chunks relevant to the query comprises accessing the second index, the method further comprising:
    - receiving, with the database system, a second query for the RDF database;
      
      accessing the second index to identify the subset of the data chunks as relevant to the second query; and
      
      executing the second query only against triples included within the subset of the data chunks loaded to the main memory to obtain a query result for the second query.
  - 11. The method of claim 1, wherein a number of triples included in a first one of the plurality of data chunks is different than a number of triples included in a second one of the plurality of data chunks.
  - 12. The method of claim 1, further comprising:
    - selecting fewer than all of the triples included within the subset of the data chunks loaded to the main memory to be cleared;
      
      freeing the main memory of the selected triples.
  - 14. The method of claim 1, wherein the query comprises a script comprising:
    - a gather step for identifying the subset of the data chunks relevant to the query and loading the subset of the data chunks to the main memory;
      
      a sift step for executing the query only against triples included within the subset of the data chunks loaded to the main memory; and
      
      a clear step for freeing the main memory of one or more of the triples included within the subset of the data chunks loaded to the main memory.
  - 15. The method of claim 14, wherein the sift step comprises executing a SPARQL Protocol and RDF Query Language (SPARQL) query included in the script against triples included within the subset of the data chunks loaded to the main memory.
  - 16. The method of claim 15,wherein the SPARQL query comprises a first SPARQL query,wherein the gather step comprises a first gather step and a second gather step,wherein the subset of the data chunks comprises a first subset of the data chunk, the method further comprising:
    - executing a second gather step for identifying a second subset of the data chunks and loading the second subset of the data chunks to the main memory; and
      
      executing a second SPARQL query included in the script against triples included within the second subset of the data chunks.
  - 17. The method of claim 14, further comprising:
    - fragmenting the script into a plurality of script fragments; and
      
      sending one of the plurality of script fragments to a remote instance of the database system for execution, wherein the remote instance stores a data chunk relevant to executing the script fragment sent for execution.
  - 18. The method of claim 17, further comprising:
    - receiving, with the database system, a query result fragment from the remote instance of the database system,wherein executing the query only against triples included within the subset of the data chunks loaded to the main memory to obtain the query result comprises merging the query result fragment with the query result.

13. (canceled)

19. A database system comprising:
- a Resource Description Framework (RDF) database that stores a plurality of data chunks to one or more storage drives, wherein each of the plurality of data chunks includes a plurality of triples of the RDF database;
  
  a working memory;
  
  a query interface that receives a query for the RDF database;
  
  a query parser/evaluator that accesses an index that indexes one or more of the data chunks to identify a subset of the data chunks relevant to the query;
  
  an index interface that includes one or more bulk loaders that load the subset of the data chunks to the working memory; and
  
  a SPARQL Protocol and RDF Query Language (SPARQL) engine that executes the query only against triples included within the subset of the data chunks loaded to the main memory to obtain a query result.
- View Dependent Claims (20)
- - 20. The database system of claim 19, wherein the database system is distributed among a plurality of instances.

21. A computer-readable storage device comprising instructions for causing one or more programmable processors to:
- receive, with a database system, a query for a Resource Description Framework (RDF) database that stores a plurality of data chunks to one or more storage drives, wherein each of the plurality of data chunks includes a plurality of triples of the RDF database;
  
  access an index that indexes one or more of the data chunks to identify a subset of the data chunks relevant to the query;
  
  load the subset of the data chunks to a main memory associated with the database system; and
  
  execute the query only against triples included within the subset of the data chunks loaded to the main memory to obtain a query result.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Architecture Technology Corporation
Original Assignee
Architecture Technology Corporation
Inventors
Stillerman, Matthew A., Joyce, Robert A.

Granted Patent

US 8,756,237 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/741
CPC Class Codes

G06F 16/00   Information retrieval; Data...

G06F 16/22   Indexing; Data structures t...

G06F 16/24552   Database cache management

SCALABLE DISTRIBUTED PROCESSING OF RDF DATA

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

121 Citations

21 Claims

Specification

Use Cases

Quick Links

Others

SCALABLE DISTRIBUTED PROCESSING OF RDF DATA

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

121 Citations

21 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others