Scalable summarization of data graphs
First Claim
Patent Images
1. A method for summarizing resource description framework datasets, the method comprising:
- creating a condensed view of the resource description framework dataset graph comprising a plurality of entity vertices, type vertices and keyword vertices connected by a plurality of predicate edges by combining entity, keyword and type vertices into a plurality of condensed vertices linked only by inter entity vertex predicate edges from the resource description framework dataset, the condensed view comprising a dataset graph;
removing entity information and keyword information from each condensed vertex and maintaining only type information in each condensed vertex;
grouping the plurality of condensed vertices by common type information;
splitting the condensed view of the resource description framework dataset graph into a plurality of partitions, each partition associated with a unique common type value selected from the type vertices and comprising a plurality of vertices and predicate edges connecting the vertices, splitting the condensed view of the resource description framework dataset graph comprising;
creating a plurality of predicate edge disjoint subgraphs by selecting condensed vertices on which to begin predicate edge disjoint graphs by group and exhausting all condensed vertices in a given group before advancing to a subsequent group, each subgraph beginning at a given condensed vertex and extending out a predetermined number of hops through the condensed view of the resource description framework, each partition comprising all subgraphs beginning at condensed vertices comprising common type information; and
defining a minimum set of common type based structures summarizing the plurality of partitions.
1 Assignment
0 Petitions
Accused Products
Abstract
Keyword searching is used to explore and search large Resource Description Framework datasets having unknown or constantly changing structures. A succinct and effective summarization is built from the underlying resource description framework data. Given a keyword query, the summarization lends significant pruning powers to exploratory keyword searches and leads to much better efficiency compared to previous work. The summarization returns exact results and can be updated incrementally and efficiently.
10 Citations
17 Claims
-
1. A method for summarizing resource description framework datasets, the method comprising:
-
creating a condensed view of the resource description framework dataset graph comprising a plurality of entity vertices, type vertices and keyword vertices connected by a plurality of predicate edges by combining entity, keyword and type vertices into a plurality of condensed vertices linked only by inter entity vertex predicate edges from the resource description framework dataset, the condensed view comprising a dataset graph; removing entity information and keyword information from each condensed vertex and maintaining only type information in each condensed vertex; grouping the plurality of condensed vertices by common type information; splitting the condensed view of the resource description framework dataset graph into a plurality of partitions, each partition associated with a unique common type value selected from the type vertices and comprising a plurality of vertices and predicate edges connecting the vertices, splitting the condensed view of the resource description framework dataset graph comprising; creating a plurality of predicate edge disjoint subgraphs by selecting condensed vertices on which to begin predicate edge disjoint graphs by group and exhausting all condensed vertices in a given group before advancing to a subsequent group, each subgraph beginning at a given condensed vertex and extending out a predetermined number of hops through the condensed view of the resource description framework, each partition comprising all subgraphs beginning at condensed vertices comprising common type information; and defining a minimum set of common type based structures summarizing the plurality of partitions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer-readable storage medium containing a computer-readable code that when read by a computer causes the computer to perform a method for summarizing resource description framework datasets, the method comprising:
-
splitting a resource description framework dataset graph comprising a plurality of entity vertices, type vertices and keyword vertices connected by a plurality of predicate edges into a plurality of partitions, each partition associated with a unique common type value selected from the type vertices and comprising a plurality of vertices and predicate edges connecting the vertices; and defining a minimum set of common type based structures summarizing the plurality of partitions, wherein defining the minimum set of common type based structures summarizing the plurality of partitions comprises; creating a plurality of covering trees to represent the plurality of partitions by traversing each partition to create an associated covering tree comprising all distinct paths through the vertices of that partition; extracting a core for each covering tree, the core comprising a minimum number of vertices for the covering tree; using the extracted core to represent the structure of that covering tree; and using homomorphisms among the plurality of covering trees to create the minimum set of common type based structures, wherein using homomorphisms among the plurality of covering trees comprises; sequentially comparing each extracted core to existing structures in the minimum set of common type based structures; removing existing structures from the minimum set of common type based structures that represent a subset of a given extracted core being compared; terminating comparison of a given extracted core upon determination that the given extracted core represents a subset of existing structures in the minimum set of common type based structures; and adding a given extracted core to the minimum set of common type based structures upon completing a comparison of that given extracted core to all existing structures in the minimum set of common type based structures and determining that the given extract core is not a subset of any existing structure. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A method for summarizing resource description framework datasets, the method comprising:
-
creating a condensed view of a resource description framework dataset graph comprising a plurality of entity vertices, type vertices and keyword vertices connected by a plurality of predicate edges by combining entity, keyword and type vertices into a plurality of condensed vertices linked only by inter entity vertex predicate edges from the resource description framework dataset; removing entity information and keyword information from each condensed vertex and maintaining only type information in each condensed vertex; grouping the plurality of condensed vertices by common type information; selecting condensed vertices sequentially by group; creating a plurality of predicate edge disjoint subgraphs, each subgraph beginning at a given condensed vertex within a given group and extending out a predetermined number of hops through the condensed view of the resource description framework; defining a plurality of partitions, each partition comprising all subgraphs beginning at condensed vertices comprising common type information; creating a plurality of covering trees to represent the plurality of partitions by traversing each partition to create an associated covering tree comprising all distinct paths through the vertices of that partition; extracting a core for each covering tree, the core comprising a minimum number of vertices for the covering tree; using homomorphisms among the plurality of covering trees to creating a minimum set of common type based structures summarizing the plurality of partitions; and maintaining a plurality of auxiliary indexes in combination with the minimum set of common type based structures, the plurality of auxiliary indexes suitable to recreate the resource description framework dataset graph from the minimum set of common type based structures and the plurality of partitions.
-
Specification