System and method for query processing and optimization for XML repositories
First Claim
1. A query processing method for querying a plurality of hierarchical documents having corresponding schemas and contents, comprising:
- extracting structural relationship sets from the schemes;
comparing a query against the structural relationship sets to determine if the query is to be executed on the hierarchical documents; and
optimizing the query processing by filtering documents from the plurality of hierarchical documents that will produce an empty result;
wherein optimizing the query processing comprises maintaining an indexing structure for tree pattern matching;
wherein, for each non-filtered document, the indexing structure comprises three indices;
a value index that corresponds to a document content;
a structure index that corresponds to a tree structure pattern of the non-filtered document; and
a link index that corresponds to link relationships between the non-filtered document and the plurality of hierarchical documents; and
wherein optimizing the query processing further comprises, for each non-filtered document;
organizing the three indices as a trie; and
querying the trie using tree pattern matching.
3 Assignments
0 Petitions
Accused Products
Abstract
A computer program product is provided as a system for querying a repository of XML documents. The data in the XML documents are viewed by a query system as a graph that allows queries on content, structure, inter-document links, and intra-document links. The query language uses XML syntax and is based on tree pattern match semantics. The features of the query language allow the query system to compute a DTD for the query language and to use it to validate the user query formulation. Query optimization is done using schema-based optimization and index based optimization. Optimization uses the schema for (a) minimizing the number of documents on which the query need to be executed; (b) eliminating redundant conditions specified in the query; and (c) simplifying expensive query constructs. The query system maintains three sets of indices for each XML document: (a) value indices corresponding to text; (b) structure indices corresponding to tree structure patterns; and (c) link indices corresponding to link relationships.
-
Citations
22 Claims
-
1. A query processing method for querying a plurality of hierarchical documents having corresponding schemas and contents, comprising:
-
extracting structural relationship sets from the schemes;
comparing a query against the structural relationship sets to determine if the query is to be executed on the hierarchical documents; and
optimizing the query processing by filtering documents from the plurality of hierarchical documents that will produce an empty result;
wherein optimizing the query processing comprises maintaining an indexing structure for tree pattern matching;
wherein, for each non-filtered document, the indexing structure comprises three indices;
a value index that corresponds to a document content;
a structure index that corresponds to a tree structure pattern of the non-filtered document; and
a link index that corresponds to link relationships between the non-filtered document and the plurality of hierarchical documents; and
wherein optimizing the query processing further comprises, for each non-filtered document;
organizing the three indices as a trie; and
querying the trie using tree pattern matching. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
precomputing results for tree structure patterns of the non-filtered documents;
converting the tree structure patterns into strings;
storing the strings as part of the trie;
storing the precomputed results corresponding to each string as part of the indexing structure; and
optimizing the query processing by querying the indexing structure.
-
-
7. The query processing method of claim 6, further comprising precomputing results for common tree structure patterns.
-
8. The query processing method of claim 7, wherein precomputing results comprising converting the tree structure patterns into strings, and maintaining a result for each tree structure pattern.
-
9. The query processing method of claim 7, wherein optimizing the query processing the documents comprise XML documents.
-
10. A query system for querying a plurality of hierarchical documents having corresponding schemas and contents, comprising:
-
a preprocessor comprising a relation set generator that computes structural relationship sets between elements and attributes of a schema, and an indexer that computes values and structure indices; and
a query processor for performing a query and returning a result set, comprising;
a query optimizer that builds an execution plan for a document; and
a query executor that executes the execution plan;
wherein the query processor compares the query against the structural relationship sets to determine if the query is to be executed on the documents; and
wherein the query processor optimizes a query processing by filtering documents that will produce an empty result;
wherein the query processor optimizes by using the indices instead of executing the query on the actual documents;
wherein the query processor maintains an indexing structure for tree pattern matching;
wherein, for each non-filtered document, the indexing structure comprises three indices;
a value index that corresponds to a document content;
a structure index that corresponds to a tree structure pattern of the non-filtered document; and
a link index that corresponds to link relationships between the non-filtered document and the plurality of hierarchical documents; and
wherein, for each non-filtered document, the query processor furtherorganizes the three indices as a trie; and
queries the trie using tree pattern matching. - View Dependent Claims (11, 12, 13, 14, 15, 16)
a schema extractor for extracting structural relationship sets from the schemas; and
an indexing system.
-
-
13. The query system of claim 10, wherein the query processor optimizes the query processing by determining if the query to be executed on the hierarchical documents will produce an empty result, and eliminating all the documents conforming to the common schema that will produce an empty result.
-
14. The query system of claim 10, wherein the query processor further optimizes the query processing by removing redundant query constraints, and by simplifying complex constraints, if any.
-
15. The query system of claim 12, wherein the preprocessor further optimizes the query processing by:
-
precomputing results for tree structure patterns;
converting the tree structure patterns into strings;
storing the strings as a trie;
storing the precomputed results corresponding to each string as an index; and
optimizing the query processing by querying the index.
-
-
16. The query system of claim 15, wherein the hierarchical documents comprise XML documents.
-
17. A query system having instruction codes for querying a plurality of hierarchical documents having corresponding schemas and contents, comprising:
-
a preprocessor comprising a first set of instruction codes for computing structural relationship sets between elements and attributes of a schema, and a second set of instruction codes for computing values and structure indices;
a query processor for performing a query and returning a result set, comprising;
a third set of instruction codes for building an execution plan for a document; and
a fourth set of instruction codes for executing the execution plan;
a fifth set of instruction codes for comparing the query against the structural relationship sets to determine if the query is to be executed on the documents; and
a sixth set of instruction codes for optimizing a query processing by filtering documents that will produce an empty result;
a seventh set of instruction codes for optimizing by using the indices instead of executing the query on the actual documents;
an eight set of instruction codes for maintaining an indexing structure for tree pattern matching;
wherein, for each non-filtered document, the indexing structure comprises three indices;
a value index that corresponds to a document content;
a structure index that corresponds to a tree structure pattern of the non-filtered document; and
a link index that corresponds to link relationships between the non-filtered document and the plurality of hierarchical documents; and
wherein, for each non-filtered document, the query processor furtherorganizes the three indices as a trie; and
queries the trie using tree pattern matching. - View Dependent Claims (18, 19, 20, 21, 22)
a schema extractor for extracting structural relationship sets from the schemas; and
an indexing system.
-
-
20. The query system of claim 17, wherein the query processor optimizes the query processing by determining if the query to be executed on the hierarchical documents will produce an empty result, and eliminating all the documents conforming to the common schema that will produce an empty result.
-
21. The query system of claim 17, wherein the query processor further optimizes the query processing by removing redundant query constraints, and by simplifying complex constraints, if any.
-
22. The query system of claim 19, wherein the preprocessor further optimizes the query processing by:
-
precomputing results for tree structure patterns;
converting the tree structure patterns into strings;
storing the strings as a trie;
storing the precomputed results corresponding to each string as an index; and
optimizing the query processing by querying the index.
-
Specification