Scalable summarization of data graphs
First Claim
Patent Images
1. A system for summarizing resource description framework datasets, the system comprising:
- a computer in communication with a network; and
a database in communication with the computer, the database comprising;
a resource description framework dataset graph comprising entity vertices associated with data accessible across the network, type vertices associated with the entity vertices, keyword vertices associated with the entity vertices and a plurality of predicate edges connecting pairs of entity vertices, type vertices and keyword vertices;
a plurality of partitions, each partition comprising;
a portion of the vertices and predicate edges from the resource description framework dataset graph; and
one or more predicate edge disjoint subgraphs, each subgraph comprising a given condensed vertex and any additional condensed vertices extending out a predetermined number of hops from the given condensed vertex, the condensed vertices linked only by inter entity vertex predicate edges from the resource description framework dataset; and
a minimum set of common type based structures summarizing the plurality of partitions.
1 Assignment
0 Petitions
Accused Products
Abstract
Keyword searching is used to explore and search large Resource Description Framework datasets having unknown or constantly changing structures. A succinct and effective summarization is built from the underlying resource description framework data. Given a keyword query, the summarization lends significant pruning powers to exploratory keyword searches and leads to much better efficiency compared to previous work. The summarization returns exact results and can be updated incrementally and efficiently.
12 Citations
20 Claims
-
1. A system for summarizing resource description framework datasets, the system comprising:
-
a computer in communication with a network; and a database in communication with the computer, the database comprising; a resource description framework dataset graph comprising entity vertices associated with data accessible across the network, type vertices associated with the entity vertices, keyword vertices associated with the entity vertices and a plurality of predicate edges connecting pairs of entity vertices, type vertices and keyword vertices; a plurality of partitions, each partition comprising; a portion of the vertices and predicate edges from the resource description framework dataset graph; and one or more predicate edge disjoint subgraphs, each subgraph comprising a given condensed vertex and any additional condensed vertices extending out a predetermined number of hops from the given condensed vertex, the condensed vertices linked only by inter entity vertex predicate edges from the resource description framework dataset; and a minimum set of common type based structures summarizing the plurality of partitions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for summarizing resource description framework datasets, the system comprising:
-
a computer in communication with a network; and a database in communication with the computer, the database comprising; a resource description framework dataset graph comprising entity vertices associated with data accessible across the network, type vertices associated with the entity vertices, keyword vertices associated with the entity vertices and a plurality of predicate edges connecting pairs of entity vertices, type vertices and keyword vertices; a plurality of partitions, each partition comprising a portion of the vertices and predicate edges from the resource description framework dataset graph; and a minimum set of common type based structures summarizing the plurality of partitions; and a plurality of auxiliary indexes in combination with the minimum set of common type based structures, the plurality of auxiliary indexes sufficient to recreate the resource description framework dataset graph from the minimum set of common type based structures and the plurality of partitions. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A system for summarizing resource description framework datasets, the system comprising:
-
a computer in communication with a network; and a database in communication with the computer, the database comprising; a resource description framework dataset graph comprising entity vertices associated with data accessible across the network, type vertices associated with the entity vertices, keyword vertices associated with the entity vertices and a plurality of predicate edges connecting pairs of entity vertices, type vertices and keyword vertices; a plurality of partitions, each partition comprising a portion of the vertices and predicate edges from the resource description framework dataset graph; a minimum set of common type based structures summarizing the plurality of partitions; a condensed view of the resource description framework dataset graph, the condensed view comprising a plurality of condensed vertices linked only by inter entity vertex predicate edges from the resource description framework dataset, each condensed vertex associated with an entity vertex in the resource description framework dataset graph and comprising only type information from a given type vertex associated with that entity vertex; wherein each partition in the plurality of partitions further comprises; a portion of the condensed vertices and the inter entity vertex predicate edges from the condensed view of the resource description framework data graph; and one or more predicate edge disjoint subgraphs, each subgraph comprising a given condensed vertex and any additional condensed vertices extending out a predetermined number of hops through the condensed view of the resource description framework from the given condensed vertex.
-
Specification