Using sibling-count in XML indexes to optimize single-path queries
First Claim
1. A method for optimizing a query, comprising performing a machine-executed operation involving instructions, wherein said instructions are instructions which, when executed by one or more processors, cause the one or more processors to perform certain steps comprising:
- for each node in a set of nodes, creating an entry in an index;
including, in the entry for each node, stored information that indicates whether there is either one node or more than one node at a path of the each node;
receiving a first query against the set of nodes, wherein the first query specifies a function that receives a specified path as an argument;
wherein the function is configured to;
return a certain value when there is only one node at the specified path; and
not return the certain value when there is more than one node at the specified path;
transforming the first query into a second query, wherein the second query;
does not include the function;
includes a condition that is based on the stored information that indicates whether there is either one node or more than one node at the specified path; and
includes operators operating on the index; and
wherein the second query is configured to return an equivalent result as the first query.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus are provided for using sibling-counts in XML indices to optimize single-path queries. Using a b-tree XML index with a SQL query logarithmically reduces the number of disk accesses by passing over index entries where it is determined that a match will not be found. However, because certain index entries are passed over, it is impossible to ascertain if a path expression occurs more than once in the XML index, as certain queries sometimes require. This hurdle can be overcome by maintaining a sibling count with each node entry in the XML index. Because the sibling count is stored with the index entry, the index will reveal whether the matching node is single or has other siblings. In additional to re-writing the original query for optimization by use of an XML index, it will be re-written to check for a single-path condition in the index.
122 Citations
18 Claims
-
1. A method for optimizing a query, comprising performing a machine-executed operation involving instructions, wherein said instructions are instructions which, when executed by one or more processors, cause the one or more processors to perform certain steps comprising:
-
for each node in a set of nodes, creating an entry in an index; including, in the entry for each node, stored information that indicates whether there is either one node or more than one node at a path of the each node; receiving a first query against the set of nodes, wherein the first query specifies a function that receives a specified path as an argument; wherein the function is configured to; return a certain value when there is only one node at the specified path; and not return the certain value when there is more than one node at the specified path; transforming the first query into a second query, wherein the second query; does not include the function; includes a condition that is based on the stored information that indicates whether there is either one node or more than one node at the specified path; and includes operators operating on the index; and wherein the second query is configured to return an equivalent result as the first query. - View Dependent Claims (2, 3, 4, 5, 16)
-
-
6. A volatile or non-volatile computer-readable storage medium storing one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform:
-
for each node in a set of nodes, creating an entry in an index; including, in the entry for each node, stored information that indicates whether there is either one node or more than one node at a path of the each node; receiving a first query against the set of nodes, wherein the first query specifies a function that receives a specified path as an argument; wherein the function is configured to; return a certain value when there is only one node at the specified path; and not return the certain value when there is more than one node at the specified path; transforming the first query into a second query, wherein the second query; does not include the function; includes a condition that is based on the stored information that indicates whether there is either one node or more than one node at the specified path; and includes operators operating on the index; and wherein the second query is configured to return an equivalent result as the first query. - View Dependent Claims (7, 8, 9, 10, 17)
-
-
11. An apparatus for optimizing a query, comprising:
-
one or more computing devices configured to perform; for each node in a set of nodes, creating an entry in an index; including, in the entry for each node, stored information that indicates whether there is either one node or more than one node at a path of the each node; receiving a first query against the set of nodes, wherein the query specifies a function that receives a specified path as an argument; wherein the function is configured to; return a certain value when there is only one node at the specified path; and not return the certain value when there is more than one node at the specified path; transforming the first query into a second query, wherein the second query; does not include the function; includes a condition that is based on the stored information that indicates whether there is either one node or more than one node at the specified path; and includes operators operating on the index; and wherein the second query is configured to return an equivalent result as the first query. - View Dependent Claims (12, 13, 14, 15, 18)
-
Specification