Index structure for supporting structural XML queries
First Claim
1. A method of generating a virtual suffix tree (ViST) structure for searching XML documents, comprising:
- receiving one or more XML documents;
converting the one or more XML documents into respective structure-encoded sequences;
generating the ViST structure comprising;
generating a D-Ancestor index of node pairs in the respective structure-encoded sequences;
generating an S-Ancestor index of labels in one or more suffix trees corresponding to respective ones of the structure-encoded sequences; and
generating a doc-ID index encoding the D-Ancestor index and the S-Ancestor index for each node of the structure-encoded sequences, wherein the encoding of the doc-ID index contains an answer to a query matching a non-contiguous subsequence in the doc-ID index; and
updating the ViST structure, the updating comprising;
receiving a new XML document;
transforming the new XML document into a respecitve structure-encoded sequence;
inserting each element of the sequence into the D-Ancestor index to update relationships among nodes of the D-Ancestor index wherein the insertion of at least one of the elements results in the creation of a new node;
assigning a new label to the new node based on an estimated number of different elements following the element corresponding to the new node and an occurrence probability of each of the elements following the element corresponding to the new node; and
inserting the new label into the S-Ancestor index.
0 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘//’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B+Trees without using any specialized data structures that are not well supported by common database management systems (hereinafter referred to as “DBMSs”).
16 Citations
10 Claims
-
1. A method of generating a virtual suffix tree (ViST) structure for searching XML documents, comprising:
-
receiving one or more XML documents; converting the one or more XML documents into respective structure-encoded sequences; generating the ViST structure comprising; generating a D-Ancestor index of node pairs in the respective structure-encoded sequences; generating an S-Ancestor index of labels in one or more suffix trees corresponding to respective ones of the structure-encoded sequences; and generating a doc-ID index encoding the D-Ancestor index and the S-Ancestor index for each node of the structure-encoded sequences, wherein the encoding of the doc-ID index contains an answer to a query matching a non-contiguous subsequence in the doc-ID index; and updating the ViST structure, the updating comprising; receiving a new XML document; transforming the new XML document into a respecitve structure-encoded sequence; inserting each element of the sequence into the D-Ancestor index to update relationships among nodes of the D-Ancestor index wherein the insertion of at least one of the elements results in the creation of a new node; assigning a new label to the new node based on an estimated number of different elements following the element corresponding to the new node and an occurrence probability of each of the elements following the element corresponding to the new node; and inserting the new label into the S-Ancestor index. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A machine-readable medium having instructions stored thereon for execution by a processor to perform a method of generating a virtual suffix tree (ViST) structure for searching XML documents, comprising the steps of:
-
receiving one or more XML documents; converting the one or more XML documents into respective structure-encoded sequences; generating the ViST structure comprising; generating a D-Ancestor index of node pairs in the respective structure-encoded sequences; generating an S-Ancestor index of labels in one or more suffix trees corresponding to respective ones of the structure-encoded sequences; and generating a doc-ID index encoding the D-Ancestor index and the S-Ancestor index for each node of the structure-encoded sequences, wherein the encoding of the doc-ID index contains an answer to a query matching a non-contiguous subsequence in the doc-ID index; updating the ViST structure, the updating comprising; receiving a new XML document; transforming the new XML document into a respective structure-encoded sequence; inserting each element of the sequence into the D-Ancestor index to update relationships among nodes of the D-Ancestor index wherein the insertion of at least one of the elements results in the creation of a new node; assigning a new label to the new node based on an estimated number of different elements following the element corresponding to the new node and an occurrence probability of each of the elements following the element corresponding to the new node; and inserting the new label into the S-Ancestor index. - View Dependent Claims (7, 8, 9, 10)
-
Specification