Using sibling-count in XML indexes to optimize single-path queries
First Claim
1. A method for optimizing a query, comprising performing a machine-executed operation involving instructions, wherein said instructions are instructions which, when executed by one or more processors, cause the one or more processors to perform certain steps comprising:
- receiving a query that specifies a function, wherein the function;
receives a path expression as an argument;
returns a value of an XML node when the path expression refers to a single node; and
does not return any value when the path expression refers to more than one XML node;
determining a quantity of a set of XML nodes to which the path expression refers;
transforming the first query into a second query, wherein the second query;
does not include the function;
includes a condition that a match occurs when said quantity is one; and
returns a second result that is identical to the first result returned by the first query,wherein the machine-executed operation is at least one of (a) sending said instructions over transmission media, (b) receiving said instructions over transmission media, (c) storing said instructions onto a machine-readable storage medium, or (d) executing the instructions.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus are provided for using sibling-counts in XML indices to optimize single-path queries. Using a b-tree XML index with a SQL query logarithmically reduces the number of disk accesses by passing over index entries where it is determined that a match will not be found. However, because certain index entries are passed over, it is impossible to ascertain if a path expression occurs more than once in the XML index, as certain queries sometimes require. This hurdle can be overcome by maintaining a sibling count with each node entry in the XML index. Because the sibling count is stored with the index entry, the index will reveal whether the matching node is single or has other siblings. In additional to re-writing the original query for optimization by use of an XML index, it will be re-written to check for a single-path condition in the index.
101 Citations
20 Claims
-
1. A method for optimizing a query, comprising performing a machine-executed operation involving instructions, wherein said instructions are instructions which, when executed by one or more processors, cause the one or more processors to perform certain steps comprising:
-
receiving a query that specifies a function, wherein the function; receives a path expression as an argument; returns a value of an XML node when the path expression refers to a single node; and does not return any value when the path expression refers to more than one XML node; determining a quantity of a set of XML nodes to which the path expression refers; transforming the first query into a second query, wherein the second query; does not include the function; includes a condition that a match occurs when said quantity is one; and returns a second result that is identical to the first result returned by the first query, wherein the machine-executed operation is at least one of (a) sending said instructions over transmission media, (b) receiving said instructions over transmission media, (c) storing said instructions onto a machine-readable storage medium, or (d) executing the instructions. - View Dependent Claims (2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13)
-
-
7. A method for optimizing single-path queries of one or more XML documents, comprising the machine-implemented steps of:
-
receiving a SQL query containing an extractValue( ) operator taking an XPath argument; for each of the XML documents, determining a quantity of a set of XML nodes to which the XPath expression refers; for each of the XML documents, creating an entry in a first index for each XML node in the set of nodes; for each of the XML documents, including a value of each XML node in the set for each entry in the first index; for each of the XML documents, including the quantity in each entry in the first index, creating a second index on the values, wherein the second query includes operators operating on at least one of the set consisting of the first index and the second index; re-writing the SQL query such that a subquery replaces the extractValue( ) operator to form a second query; re-writing the second query such that the query does not contain any subqueries to form a third query; including a condition in the third query that the quantity is one; and evaluating the third query using the second index. - View Dependent Claims (14)
-
-
15. An apparatus for optimizing a query, comprising:
-
means for receiving a query that specifies a function, wherein the function; receives a path expression as an argument; returns a value of an XML node when the path expression refers to a single node; and does not return any value when the path expression refers to more than one XML node; means for determining a quantity of a set of XML nodes to which the path expression refers; means for transforming the first query into a second query, wherein the second query; does not include the function; includes a condition that a match occurs when said quantity is one; and returns a second result that is identical to the first result returned by the first query - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification