Techniques of efficient XML query using combination of XML table index and path/value index
First Claim
1. A computer-implemented method comprising:
- causing execution of an index generation statement identifying a plurality of path expressions and a plurality of columns of a first table for indexing a collection of documents, wherein said plurality of path expressions identify less than all nodes in the collection of documents;
wherein for each column of said plurality of columns, said index generation statement specifies an association between said each column and a respective path expression of said plurality of path expressions;
wherein execution of said index generation statement causes generation of said first table, wherein the first table comprises a first set of entries;
wherein for each column of the plurality of columns of said first table, each entry of the first set of entries contains a node value of a node identified by the respective path expression of said each column, said node value being from a document of said collection of documents;
wherein the collection of documents is also indexed by a second table, wherein the second table comprises a second set of entries, each entry in the second set of entries;
being associated with a given node of a document in the collection of documents, andincluding location data for locating content in the document, wherein the content is associated with the given node and path data that corresponds to a path to the given node in the document;
intercepting, by a database system, a query for first information from a collection of documents,wherein the query for first information does not reference said first table and said second table;
said database system rewriting the query for first information to generate a rewritten query that references said first table and said second table;
wherein the query comprises one or more predicates;
based on the rewritten query, said database system generating a first query plan using both the first table and the second table, wherein the first query plan, when executed by the database system, causes the database system to perform;
identifying one or more first entries from the first table that contain a node value that satisfies the one or more predicates;
extracting second information from the one or more first entries identified from the first table;
extracting, using the second information, the first information from one or more second entries in the second table;
wherein the first table and the second table are two different tables.
1 Assignment
0 Petitions
Accused Products
Abstract
A mechanism is provided for accessing XML data in a database system using a combination of a XML Table Index table and a XML Path Index table. By using a combination of a XML Table Index and a XML Path Index, both selection access and navigational access involved in a query can be optimized. For example, the XML Table Index gives the database system an ability to readily evaluate the predicate expression, thereby improving the selection access. Moreover, in some embodiments, the selection access can be further improved by using secondary indexes on columns contained in the XML Table Index table. In a complementary manner, the XML Path Index table gives the database system an ability to navigate to a specific location given a path expression, thereby improving the navigational access. Thus, by combining both tables, both selection and navigational accesses are improved.
-
Citations
28 Claims
-
1. A computer-implemented method comprising:
-
causing execution of an index generation statement identifying a plurality of path expressions and a plurality of columns of a first table for indexing a collection of documents, wherein said plurality of path expressions identify less than all nodes in the collection of documents; wherein for each column of said plurality of columns, said index generation statement specifies an association between said each column and a respective path expression of said plurality of path expressions; wherein execution of said index generation statement causes generation of said first table, wherein the first table comprises a first set of entries; wherein for each column of the plurality of columns of said first table, each entry of the first set of entries contains a node value of a node identified by the respective path expression of said each column, said node value being from a document of said collection of documents; wherein the collection of documents is also indexed by a second table, wherein the second table comprises a second set of entries, each entry in the second set of entries; being associated with a given node of a document in the collection of documents, and including location data for locating content in the document, wherein the content is associated with the given node and path data that corresponds to a path to the given node in the document; intercepting, by a database system, a query for first information from a collection of documents, wherein the query for first information does not reference said first table and said second table; said database system rewriting the query for first information to generate a rewritten query that references said first table and said second table; wherein the query comprises one or more predicates; based on the rewritten query, said database system generating a first query plan using both the first table and the second table, wherein the first query plan, when executed by the database system, causes the database system to perform; identifying one or more first entries from the first table that contain a node value that satisfies the one or more predicates; extracting second information from the one or more first entries identified from the first table; extracting, using the second information, the first information from one or more second entries in the second table; wherein the first table and the second table are two different tables. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A volatile or non-volatile computer-readable medium storing one or more sequences of instructions which, when executed by one or more processors, cause:
-
executing an index generation statement identifying a plurality of path expressions and a plurality of columns of a first table for indexing a collection of documents wherein said plurality of path expressions identify less than all nodes in the collection of documents wherein for each column of said plurality of columns, said index generation statement specifies an association between said each column and a respective path expression of said plurality of path expressions; wherein execution of said index generation statement causes generation of said first table, wherein the first table comprises a first set of entries; wherein for each column of the plurality of columns of said first table, each entry of the first set of entries contains a node value of a node identified by the respective path expression of said each column, said node value being from a document of said collection of documents; wherein the collection of documents is also indexed by a second table, wherein the second table comprises a second set of entries, each entry in the second set of entries; being associated with a given node of a document in the collection of documents, and including location data for locating content in the document, wherein the content is associated with the given node and path data that corresponds to a path to the given node in the document; and intercepting, by a database system, a query for first information from a collection of documents, wherein the query for first information does not reference said first table and said second table; said database system rewriting the query for first information to generate a rewritten query that references said first table and said second table; wherein the query comprises one or more predicates; based on the rewritten query, said database system generating a first query plan using both the first table and the second table, wherein the first query plan, when executed by the database system, causes the database system to perform; identifying one or more first entries from the first table that contain a node value that satisfies the one or more predicates; extracting second information from the one or more first entries identified from the first table; extracting, using the second information, the first information from one or more second entries in the second table wherein the first table and the second table are two different tables. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
Specification