Indexing mechanism for efficient node-aware full-text search over XML
First Claim
1. A computer-implemented method comprising:
- storing a table that stores data for a plurality of nodes in one or more XML documents,the table comprising an entry for each node of the plurality of nodes, the entry for each node comprising;
path data that specifies a path, through the structure of the one or more XML documents, to the node; and
an atomized value of the node;
wherein the atomized value of at least one node comprises a first text value of the at least one node and a second text value of a descendant node of the at least one node;
wherein the at least one node comprises a first node name and the first text value, and wherein the descendant node comprises a second node name and the second text value;
wherein the atomized value of the descendant node comprises the second text value;
storing a full-text index of the atomized values stored in the entries of the table;
wherein the method is performed by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are provided for searching within a collection of XML documents. A relational table in an XML index stores an entry for each node of a set of nodes in the collection. Each entry of the relational table stores an order key and a path identifier along with the atomized value of the node. An index on the atomized value provides a mechanism to perform a node-aware full-text search. Instead of storing the atomized value in the table, a virtual column may be created to represent, for each node, the atomized value of the node. Alternately, each entry of the relational table stores an order key and a path identifier along with, for simple nodes, the atomized value, and for complex nodes, a null value. For a complex node with a descendant text node, a separate entry is stored for the descendant text node in the relational table.
124 Citations
26 Claims
-
1. A computer-implemented method comprising:
-
storing a table that stores data for a plurality of nodes in one or more XML documents, the table comprising an entry for each node of the plurality of nodes, the entry for each node comprising; path data that specifies a path, through the structure of the one or more XML documents, to the node; and an atomized value of the node; wherein the atomized value of at least one node comprises a first text value of the at least one node and a second text value of a descendant node of the at least one node; wherein the at least one node comprises a first node name and the first text value, and wherein the descendant node comprises a second node name and the second text value; wherein the atomized value of the descendant node comprises the second text value; storing a full-text index of the atomized values stored in the entries of the table; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 10, 11)
-
-
6. A computer-implemented method comprising:
-
receiving a query that requests data from one or more XML documents; wherein said one or more XML documents comprises a plurality of nodes; evaluating the query using an index comprising a table that comprises a plurality of rows, each row of said plurality of rows corresponding to a node of said plurality of nodes; wherein each row of said plurality of rows includes; path data that specifies a path, through the structure of the one or more XML documents, to the corresponding node, and an atomized value of the corresponding node, said atomized value represented in a virtual column of said table; wherein the atomized value of at least one node comprises a first text value of a first descendant node of the at least one node and a second text value of a second descendant node of the at least one node; wherein the first descendant node comprises a first node name and the first text value, and wherein the second descendant node comprises a second node name and the second text value; wherein the atomized value of the first descendant node comprises the first text value; wherein said certain index of said one or more XML documents comprises a full-text index, said virtual column being indexed by said full-text index; wherein the method is performed by one or more computing devices. - View Dependent Claims (7, 8, 9, 12, 13)
-
-
14. One or more non-transitory computer-readable storage media storing instructions which, when executed by one or more computing devices, cause:
-
storing a table that stores data for a plurality of nodes in one or more XML documents, the table comprising an entry for each node of the plurality of nodes, the entry for each node comprising; path data that specifies a path, through the structure of the one or more XML documents, to the node; and an atomized value of the node; wherein the atomized value of at least one node comprises a first text value of the at least one node and a second text value of a descendant node of the at least one node; wherein the at least one node comprises a first node name and the first text value, and wherein the descendant node comprises a second node name and the second text value; wherein the atomized value of the descendant node comprises the second text value; storing a full-text index of the atomized values stored in the entries of the table. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
-
21. One or more non-transitory computer-readable storage media storing instructions which, when executed by one or more computing devices, cause:
-
receiving a query that requests data from one or more XML documents;
wherein said one or more XML documents comprises a plurality of nodes;evaluating the query using an index comprising a table that comprises a plurality of rows, each row of said plurality of rows corresponding to a node of said plurality of nodes; wherein each row of said plurality of rows includes; path data that specifies a path, through the structure of the one or more XML documents, to the corresponding node, and an atomized value of the corresponding node, said atomized value represented in a virtual column of said table; wherein the atomized value of at least one node comprises a first text value of a first descendant node of the at least one node and a second text value of a second descendant node of the at least one node; wherein the first descendant node comprises a first node name and the first text value, and wherein the second descendant node comprises a second node name and the second text value; wherein the atomized value of the first descendant node comprises the first text value; wherein said certain index of said one or more XML documents comprises a full-text index, said virtual column being indexed by said full-text index. - View Dependent Claims (22, 23, 24, 25, 26)
-
Specification