Method for matching XML twigs using index structures and relational query processors
First Claim
1. A data processing method for indexing a tree-structured database, D, storing a plurality of XML documents, D comprising an ordered plurality of nodes, said nodes in D being selected from the group comprising element nodes, attribute nodes and value nodes, each of said nodes in D having a respective node id, di, said nodes in D being selectively connected by edges, a data path in D comprising one or more edges, D including at least one root node, a plurality of leaf nodes, and a plurality of root-to-leaf paths, each data path in D comprising a schema path and, when a data path in D reaches a leaf node, a leaf value, said element nodes having respective tags, said attribute nodes having respective attribute names, said schema path comprising only element tags and attribute names, the method comprising generating a set of representations of data paths that are subpaths of said root-to-leaf paths, each said data path having a representation in said set of representations being associated with the node at which it is rooted, said representations of subpaths included in said set of representations comprising SchemaPath, LeafValue, and IdList information for each respective subpath, where IdList is the list of all node ids along the schema path, except for the HeadId, where HeadId is the node id of the node at the start of the data path, generating an index based on the concatenation of LeafValue and the reverse of SchemaPath, said index returning at least a portion of said IdList for each data path having a representation in said set of representations.
2 Assignments
0 Petitions
Accused Products
Abstract
A framework defining a family of index structures useful in evaluating XML path expressions (i.e., twigs) in XML database is disclosed. Within this framework, two particular index structures with different space-time tradeoffs are presented that prove effective for the evaluation of twigs with value conditions. These index structures can be realized using access methods of an underlying relational database system. Experimental results show that the indices disclosed achieve significant improvement in performance for evaluating twig queries as compared with previously proposed XML path indices.
110 Citations
26 Claims
-
1. A data processing method for indexing a tree-structured database, D, storing a plurality of XML documents, D comprising an ordered plurality of nodes, said nodes in D being selected from the group comprising element nodes, attribute nodes and value nodes, each of said nodes in D having a respective node id, di, said nodes in D being selectively connected by edges, a data path in D comprising one or more edges, D including at least one root node, a plurality of leaf nodes, and a plurality of root-to-leaf paths, each data path in D comprising a schema path and, when a data path in D reaches a leaf node, a leaf value, said element nodes having respective tags, said attribute nodes having respective attribute names, said schema path comprising only element tags and attribute names, the method comprising
generating a set of representations of data paths that are subpaths of said root-to-leaf paths, each said data path having a representation in said set of representations being associated with the node at which it is rooted, said representations of subpaths included in said set of representations comprising SchemaPath, LeafValue, and IdList information for each respective subpath, where IdList is the list of all node ids along the schema path, except for the HeadId, where HeadId is the node id of the node at the start of the data path, generating an index based on the concatenation of LeafValue and the reverse of SchemaPath, said index returning at least a portion of said IdList for each data path having a representation in said set of representations.
-
12. A data processing method for matching at least one query to XML twigs in a tree-structured database, D, storing a plurality of XML documents, D comprising an ordered plurality of nodes, said nodes in D being selected from the group comprising element nodes, attribute nodes and value nodes, each of said nodes in D having a respective node id, di, said nodes in D being selectively connected by edges, a data path in D comprising one or more edges, D including at least one root node, a plurality of leaf nodes, and a plurality of root-to-leaf paths, each data path in D comprising a schema path and, when a data path in D reaches a leaf node, a leaf value, said element nodes having respective tags, said attribute nodes having respective attribute names, said schema path comprising only element tags and attribute names, the method comprising
applying said at least one query to a relational database management system, RDBMS, said RDBMS storing an index based on a set of representations of data paths in D that are subpaths of said root-to-leaf paths, each said data path having a representation in said set of representations being associated with the node at which it is rooted, said representations of subpaths included in said set of representations comprising SchemaPath, LeafValue, and IdList information, where IdList is the list of all node ids along the schema path, except for the HeadId, where HeadId is the node id of the node at the start of the data path, said index being the concatenation of LeafValue and the reverse of SchemaPath said index returning at least a portion of said IdList for each data path having a representation in said set of representations, processing said at least one query in a relational query processor in said RDBMS to identify twigs in D matching said at least one query.
Specification