Semantic indexing engine
First Claim
1. A method of distributing n-tuples over a cluster of triple-store machines, comprising:
- storing each n-tuple as text in a distributed file system using a key value store;
providing each machine of the cluster with a resident semantic indexing engine accessing one or more persistent Resource Description Framework (RDF) triplestores for the n-tuple data stored on each machine; and
defining a partition variable for each n-tuple to ensure locality of data within each respective machine, wherein each n-tuple is a RDF triple comprising four parts, with three parts comprising a subject-predicate-object expression and a fourth part comprising the partition variable, and wherein each part of the n-tuple is encoded into a unique part identifier(UPI) comprising a tag indicating a data type of the encoded tuple part.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments are described for a method of distributing n-tuples over a cluster of triple-store machines, by storing each n-tuple as text in a distributed file system using a key value store; providing each machine of the cluster with a resident semantic data lake component accessing one or more persistent RDF triplestores for the n-tuple data stored on each machine; and defining one part of each n-tuple as a partition variable to ensure locality of data within each respective n-tuple. A method includes inserting graphs into a key/value store to determine how the key/value store distributes the data across a plurality of servers, by generating textual triple data, and storing the triple data in key-value stores wherein a fourth element of the triple comprises the key, and a value associated with the key comprises all the triples about a subject; indexing the data in the key-value store in an RDF triplestore using a partition based on the fourth element.
22 Citations
13 Claims
-
1. A method of distributing n-tuples over a cluster of triple-store machines, comprising:
-
storing each n-tuple as text in a distributed file system using a key value store; providing each machine of the cluster with a resident semantic indexing engine accessing one or more persistent Resource Description Framework (RDF) triplestores for the n-tuple data stored on each machine; and defining a partition variable for each n-tuple to ensure locality of data within each respective machine, wherein each n-tuple is a RDF triple comprising four parts, with three parts comprising a subject-predicate-object expression and a fourth part comprising the partition variable, and wherein each part of the n-tuple is encoded into a unique part identifier(UPI) comprising a tag indicating a data type of the encoded tuple part. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for facilitating fast data analytics for big data applications, comprising:
-
encoding application data comprising n-tuples into a plurality of triple-stores; partitioning the application data for storage onto a plurality of machines using a partition variable associated with each triple-store to ensure locality of data within a respective machine; storing the partitioned data in in the form of key value stores in respective machines of the plurality of machines based on the partition variable; and storing the partitioned data as semantic indexed data in a Resource Description Framework (RDF) triplestore in each respective machine, wherein each n-tuple is a RDF triple comprising four parts, with three parts comprising a subject-predicate-object expression and a fourth part comprising the partition variable, and wherein each part of the n-tuple is encoded into a unique part identifier (UPI) comprising a tag indicating a data tvpe of the encoded tuple part. - View Dependent Claims (9, 10, 11)
-
-
12. A system for distributing n-tuples over a cluster of triple-store machines, comprising:
-
a set of clusters stored on machines storing each n-tuple as text in a distributed file system using a key value store; and a semantic data lake component having a processor and one or more interfaces accessing one or more persistent Resource Description Framework (RDF) triplestores for the n-tuple data stored in memory accessed by the processor on each machine;
wherein associated with each n-tuple is a partition variable to ensure locality of data within a respective machine of the cluster of machines, and wherein each n-tuple is a RDF triple comprising four parts, with three parts comprising a subject-predicate-object expression and a fourth part comprising the partition variable, and wherein each part of the n-tuple is encoded into a unique part identifier (UPI) comprising a tag indicating a data type of the encoded tuple part. - View Dependent Claims (13)
-
Specification