System and method for query processing and optimization for XML repositories

US 6,654,734 B1
Filed: 08/30/2000
Issued: 11/25/2003
Est. Priority Date: 08/30/2000
Status: Active Grant

First Claim

Patent Images

1. A query processing method for querying a plurality of hierarchical documents having corresponding schemas and contents, comprising:

extracting structural relationship sets from the schemes;

comparing a query against the structural relationship sets to determine if the query is to be executed on the hierarchical documents; and

optimizing the query processing by filtering documents from the plurality of hierarchical documents that will produce an empty result;

wherein optimizing the query processing comprises maintaining an indexing structure for tree pattern matching;

wherein, for each non-filtered document, the indexing structure comprises three indices;

a value index that corresponds to a document content;

a structure index that corresponds to a tree structure pattern of the non-filtered document; and

a link index that corresponds to link relationships between the non-filtered document and the plurality of hierarchical documents; and

wherein optimizing the query processing further comprises, for each non-filtered document;

organizing the three indices as a trie; and

querying the trie using tree pattern matching.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer program product is provided as a system for querying a repository of XML documents. The data in the XML documents are viewed by a query system as a graph that allows queries on content, structure, inter-document links, and intra-document links. The query language uses XML syntax and is based on tree pattern match semantics. The features of the query language allow the query system to compute a DTD for the query language and to use it to validate the user query formulation. Query optimization is done using schema-based optimization and index based optimization. Optimization uses the schema for (a) minimizing the number of documents on which the query need to be executed; (b) eliminating redundant conditions specified in the query; and (c) simplifying expensive query constructs. The query system maintains three sets of indices for each XML document: (a) value indices corresponding to text; (b) structure indices corresponding to tree structure patterns; and (c) link indices corresponding to link relationships.

Citations

22 Claims

1. A query processing method for querying a plurality of hierarchical documents having corresponding schemas and contents, comprising:
- extracting structural relationship sets from the schemes;
  
  comparing a query against the structural relationship sets to determine if the query is to be executed on the hierarchical documents; and
  
  optimizing the query processing by filtering documents from the plurality of hierarchical documents that will produce an empty result;
  
  wherein optimizing the query processing comprises maintaining an indexing structure for tree pattern matching;
  
  wherein, for each non-filtered document, the indexing structure comprises three indices;
  
  a value index that corresponds to a document content;
  
  a structure index that corresponds to a tree structure pattern of the non-filtered document; and
  
  a link index that corresponds to link relationships between the non-filtered document and the plurality of hierarchical documents; and
  
  wherein optimizing the query processing further comprises, for each non-filtered document;
  
  organizing the three indices as a trie; and
  
  querying the trie using tree pattern matching.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The query processing method of claim 1, wherein optimizing the query processing further further comprises eliminating the documents that will produce an empty result.
  - 3. The query processing method of claim 1, wherein optimizing the query processing comprises removing redundant query constraints, and simplifying complex constraints, if any.
  - 4. The query processing method of claim 3, wherein optimizing the query processing comprises determining if a query constraint will always be true, and removing such constraint.
  - 5. The query processing method of claim 3, wherein optimizing the query processing further comprises simplifying complex query constraints into simpler query constraints.
  - 6. The query processing method of claim 5, further comprising:
7. The query processing method of claim 6, further comprising precomputing results for common tree structure patterns.
8. The query processing method of claim 7, wherein precomputing results comprising converting the tree structure patterns into strings, and maintaining a result for each tree structure pattern.
9. The query processing method of claim 7, wherein optimizing the query processing the documents comprise XML documents.

10. A query system for querying a plurality of hierarchical documents having corresponding schemas and contents, comprising:
- a preprocessor comprising a relation set generator that computes structural relationship sets between elements and attributes of a schema, and an indexer that computes values and structure indices; and
  
  a query processor for performing a query and returning a result set, comprising;
  
  a query optimizer that builds an execution plan for a document; and
  
  a query executor that executes the execution plan;
  
  wherein the query processor compares the query against the structural relationship sets to determine if the query is to be executed on the documents; and
  
  wherein the query processor optimizes a query processing by filtering documents that will produce an empty result;
  
  wherein the query processor optimizes by using the indices instead of executing the query on the actual documents;
  
  wherein the query processor maintains an indexing structure for tree pattern matching;
  
  wherein, for each non-filtered document, the indexing structure comprises three indices;
  
  a value index that corresponds to a document content;
  
  a structure index that corresponds to a tree structure pattern of the non-filtered document; and
  
  a link index that corresponds to link relationships between the non-filtered document and the plurality of hierarchical documents; and
  
  wherein, for each non-filtered document, the query processor further organizes the three indices as a trie; and
  
  queries the trie using tree pattern matching.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The query system of claim 10, wherein the query processor further comprises a result consolidator.
  - 12. The query system of claim 11, wherein the query processor further comprises:
13. The query system of claim 10, wherein the query processor optimizes the query processing by determining if the query to be executed on the hierarchical documents will produce an empty result, and eliminating all the documents conforming to the common schema that will produce an empty result.
14. The query system of claim 10, wherein the query processor further optimizes the query processing by removing redundant query constraints, and by simplifying complex constraints, if any.
15. The query system of claim 12, wherein the preprocessor further optimizes the query processing by:
- precomputing results for tree structure patterns;
  
  converting the tree structure patterns into strings;
  
  storing the strings as a trie;
  
  storing the precomputed results corresponding to each string as an index; and
  
  optimizing the query processing by querying the index.
16. The query system of claim 15, wherein the hierarchical documents comprise XML documents.

17. A query system having instruction codes for querying a plurality of hierarchical documents having corresponding schemas and contents, comprising:
- a preprocessor comprising a first set of instruction codes for computing structural relationship sets between elements and attributes of a schema, and a second set of instruction codes for computing values and structure indices;
  
  a query processor for performing a query and returning a result set, comprising;
  
  a third set of instruction codes for building an execution plan for a document; and
  
  a fourth set of instruction codes for executing the execution plan;
  
  a fifth set of instruction codes for comparing the query against the structural relationship sets to determine if the query is to be executed on the documents; and
  
  a sixth set of instruction codes for optimizing a query processing by filtering documents that will produce an empty result;
  
  a seventh set of instruction codes for optimizing by using the indices instead of executing the query on the actual documents;
  
  an eight set of instruction codes for maintaining an indexing structure for tree pattern matching;
  
  wherein, for each non-filtered document, the indexing structure comprises three indices;
  
  a value index that corresponds to a document content;
  
  a structure index that corresponds to a tree structure pattern of the non-filtered document; and
  
  a link index that corresponds to link relationships between the non-filtered document and the plurality of hierarchical documents; and
  
  wherein, for each non-filtered document, the query processor further organizes the three indices as a trie; and
  
  queries the trie using tree pattern matching.
- View Dependent Claims (18, 19, 20, 21, 22)
- - 18. The query system of claim 17, wherein the query processor further comprises a result consolidator.
  - 19. The query system of claim 18, wherein the query processor further comprises:
20. The query system of claim 17, wherein the query processor optimizes the query processing by determining if the query to be executed on the hierarchical documents will produce an empty result, and eliminating all the documents conforming to the common schema that will produce an empty result.
21. The query system of claim 17, wherein the query processor further optimizes the query processing by removing redundant query constraints, and by simplifying complex constraints, if any.
22. The query system of claim 19, wherein the preprocessor further optimizes the query processing by:
- precomputing results for tree structure patterns;
  
  converting the tree structure patterns into strings;
  
  storing the strings as a trie;
  
  storing the precomputed results corresponding to each string as an index; and
  
  optimizing the query processing by querying the index.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
International Business Machines Corporation
Inventors
Mani, Murali, Sundaresan, Neelakantan
Primary Examiner(s)
Homere, Jean R.
Assistant Examiner(s)
Ali, Mohammad

Application Number

US09/652,328
Time in Patent Office

1,182 Days
Field of Search

707/1-6, 707/9-10, 707/100-104, 709/224, 709/203-204, 705/7.14, 705/1, 715/513-514
US Class Current

1/1
CPC Class Codes

G06F 16/95 Retrieval from the web

Y10S 707/99932 Access augmentation or opti...

System and method for query processing and optimization for XML repositories

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for query processing and optimization for XML repositories

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links