Semantic indexing engine

US 9,679,041 B2
Filed: 12/22/2014
Issued: 06/13/2017
Est. Priority Date: 12/22/2014
Status: Active Grant

First Claim

Patent Images

1. A method of distributing n-tuples over a cluster of triple-store machines, comprising:

storing each n-tuple as text in a distributed file system using a key value store;

providing each machine of the cluster with a resident semantic indexing engine accessing one or more persistent Resource Description Framework (RDF) triplestores for the n-tuple data stored on each machine; and

defining a partition variable for each n-tuple to ensure locality of data within each respective machine, wherein each n-tuple is a RDF triple comprising four parts, with three parts comprising a subject-predicate-object expression and a fourth part comprising the partition variable, and wherein each part of the n-tuple is encoded into a unique part identifier(UPI) comprising a tag indicating a data type of the encoded tuple part.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments are described for a method of distributing n-tuples over a cluster of triple-store machines, by storing each n-tuple as text in a distributed file system using a key value store; providing each machine of the cluster with a resident semantic data lake component accessing one or more persistent RDF triplestores for the n-tuple data stored on each machine; and defining one part of each n-tuple as a partition variable to ensure locality of data within each respective n-tuple. A method includes inserting graphs into a key/value store to determine how the key/value store distributes the data across a plurality of servers, by generating textual triple data, and storing the triple data in key-value stores wherein a fourth element of the triple comprises the key, and a value associated with the key comprises all the triples about a subject; indexing the data in the key-value store in an RDF triplestore using a partition based on the fourth element.

22 Citations

View as Search Results

13 Claims

1. A method of distributing n-tuples over a cluster of triple-store machines, comprising:
- storing each n-tuple as text in a distributed file system using a key value store;
  
  providing each machine of the cluster with a resident semantic indexing engine accessing one or more persistent Resource Description Framework (RDF) triplestores for the n-tuple data stored on each machine; and
  
  defining a partition variable for each n-tuple to ensure locality of data within each respective machine, wherein each n-tuple is a RDF triple comprising four parts, with three parts comprising a subject-predicate-object expression and a fourth part comprising the partition variable, and wherein each part of the n-tuple is encoded into a unique part identifier(UPI) comprising a tag indicating a data type of the encoded tuple part.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein the partition variable is defined as one part of an associated n-tuple.
  - 3. The method of claim 2 wherein the tuple structure is utilized by a web ontology language for processing of semantic web data, and wherein the data comprises a big data application comprising a collection of large complex data sets organized into one or more data libraries.
  - 4. The method of claim 2 wherein the file system is a Hadoop distributed file system (HDFS), the method further comprising applying a machine learning process and predictive analytics processes using a data analytics cluster computing framework built on the HDFS.
  - 5. The method of claim 1 further comprising:
    - indexing the n-tuple data in the HDFS using an RDF triplestore; and
      
      maintaining synchronization of the n-tuple data between the HDFS and the RDF triplestore.
  - 6. The method of claim 5 further comprising providing an interface allowing parallel queries wherein each server node of a cluster performs an identical query on separate data sets, and the method further comprises providing a SPARQL language interface to query the key value data through each server node.
  - 7. The method of claim 5 further comprising providing an interface allowing federated queries wherein a query is sent to a server node to access a plurality of connected datasets.

8. A method for facilitating fast data analytics for big data applications, comprising:
- encoding application data comprising n-tuples into a plurality of triple-stores;
  
  partitioning the application data for storage onto a plurality of machines using a partition variable associated with each triple-store to ensure locality of data within a respective machine;
  
  storing the partitioned data in in the form of key value stores in respective machines of the plurality of machines based on the partition variable; and
  
  storing the partitioned data as semantic indexed data in a Resource Description Framework (RDF) triplestore in each respective machine, wherein each n-tuple is a RDF triple comprising four parts, with three parts comprising a subject-predicate-object expression and a fourth part comprising the partition variable, and wherein each part of the n-tuple is encoded into a unique part identifier (UPI) comprising a tag indicating a data tvpe of the encoded tuple part.
- View Dependent Claims (9, 10, 11)
- - 9. The method of claim 8 further comprising performing a parallel query on the semantic indexed data of the RDF triplestore in each respective machine having a separate dataset.
  - 10. The method of claim 9 wherein the key value stores are stored in a Hadoop Distributed File System (HDFS), and the parallel query engine comprises a SPARQL sequential query language based query engine.
  - 11. The method of claim 9 further comprising performing a federated query on the semantic indexed data of the RDF triplestore wherein a query is sent to a machine to access a plurality of connected datasets.

12. A system for distributing n-tuples over a cluster of triple-store machines, comprising:
- a set of clusters stored on machines storing each n-tuple as text in a distributed file system using a key value store; and
  
  a semantic data lake component having a processor and one or more interfaces accessing one or more persistent Resource Description Framework (RDF) triplestores for the n-tuple data stored in memory accessed by the processor on each machine;
  
  wherein associated with each n-tuple is a partition variable to ensure locality of data within a respective machine of the cluster of machines, and wherein each n-tuple is a RDF triple comprising four parts, with three parts comprising a subject-predicate-object expression and a fourth part comprising the partition variable, and wherein each part of the n-tuple is encoded into a unique part identifier (UPI) comprising a tag indicating a data type of the encoded tuple part.
- View Dependent Claims (13)
- - 13. The system of claim 12 wherein the file system is a Hadoop distributed file system (HDFS).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Franz, Inc. (TallyGo LLC), Albert Einstein College of Medicine (Yeshiva University)
Original Assignee
Franz, Inc. (TallyGo LLC)
Inventors
Aasman, Jannes, Hadfield, Marc C, Mirhaji, Parsa
Primary Examiner(s)
Nguyen, Cam-Linh

Application Number

US14/579,589
Publication Number

US 20160179979A1
Time in Patent Office

904 Days
Field of Search

707603, 707803
US Class Current
CPC Class Codes

G06F 16/24532 of parallel queries

G06F 16/278 Data partitioning, e.g. hor...

Semantic indexing engine

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

22 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Semantic indexing engine

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

22 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links