XPath query processing improvements
First Claim
1. An XPath query processing system for processing one or more inputted queries against one or more inputted XML documents, the system comprising at least one computer system including a plurality of distributed hardware processors (CPU) and memory, at least one distributed hardware processor executing steps that:
- creates a first index that comprises unique root to leaf paths (SUM-Index) where each unique root to leaf path has an associated unique path identifier (PID); and
annotates each unique path in SUM-Index by PID;
partitions SUM-Index by PID; and
creates a second index that comprises tree nodes grouped by PID (PS-Index);
partitions PS-Index by PID; and
creates a third index that comprises values of the tree nodes grouped by PID (PV-Index); and
partitions PV-Index by PID;
wherein the first index, the second index, and the third index each have at least one path identifier and are linked together by at least one path identifier (PID); and
each unique PID forms a nexus of unique path related information across the first index, the second index, and the third index; and
each index attribute of the first index, the second index, and the third index is stored in a column store; and
decomposes one or more XPath queries into one or more partial XPath queries;
obtains a list of path identifiers from the first index to initialize a simple cursor (SC) access method or a multi-predicate-branching-path cursor (MPBPC) method from the one or more partial XPath queries;
selects the second index to generate a result sequence of nodes using a simple cursor or a MPBPC cursor;
selects the third index to lookup values to filter the result sequence of nodes; and
produces one or more outputted XML documents from the result sequence of nodes.
0 Assignments
0 Petitions
Accused Products
Abstract
A method for processing an inputted XPath query against an inputted XML document is provided. The method generates a summary index, an ancestor-descendant path index, and a value index from one or more inputted XML documents. The summary index and the ancestor-descendant path index and the value index have at least one path defined. XPath queries at articulation points are parsed into multiple partial queries. The cursor type index access methods are determined. Partial queries are executed against the SUM-Index to generate a list of path identifiers (PID) satisfies the partial query segments. A set of ancestor-descendant PID identifiers list is generated to provide a result sequence. The result sequence of nodes is filtered producing one or more outputted XML documents from the final result sequence of nodes. A related system and computer medium is also provided.
-
Citations
22 Claims
-
1. An XPath query processing system for processing one or more inputted queries against one or more inputted XML documents, the system comprising at least one computer system including a plurality of distributed hardware processors (CPU) and memory, at least one distributed hardware processor executing steps that:
-
creates a first index that comprises unique root to leaf paths (SUM-Index) where each unique root to leaf path has an associated unique path identifier (PID); and annotates each unique path in SUM-Index by PID; partitions SUM-Index by PID; and creates a second index that comprises tree nodes grouped by PID (PS-Index); partitions PS-Index by PID; and creates a third index that comprises values of the tree nodes grouped by PID (PV-Index); and partitions PV-Index by PID; wherein the first index, the second index, and the third index each have at least one path identifier and are linked together by at least one path identifier (PID); and each unique PID forms a nexus of unique path related information across the first index, the second index, and the third index; and each index attribute of the first index, the second index, and the third index is stored in a column store; and decomposes one or more XPath queries into one or more partial XPath queries; obtains a list of path identifiers from the first index to initialize a simple cursor (SC) access method or a multi-predicate-branching-path cursor (MPBPC) method from the one or more partial XPath queries; selects the second index to generate a result sequence of nodes using a simple cursor or a MPBPC cursor; selects the third index to lookup values to filter the result sequence of nodes; and produces one or more outputted XML documents from the result sequence of nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer implement method for processing one or more inputted XPath queries against one or more inputted XML documents stored in a plurality of computer hardware processors (CPU) and memory, comprising:
-
loading an XML document into computer memory; generating a first index that comprises unique root to leaf paths (SUM-Index), a second index that comprises tree nodes grouped by unique path identifiers (PS-Index), and a third index that comprises values of the tree nodes grouped by path identifiers (PV-Index) from the XML document, wherein the SUM-Index, the PS-Index and the PV-Index each have at least one root to leaf unique path identifier (PID) and are linked together by at least one PID originating from the SUM-Index; annotating the SUM-Index with PID for each unique root to leaf path; stores the SUM-Index, PS-Index, and PV-Index on column stores distributed across a plurality of CPUs partitioned by PID; parsing and splitting an XPath query at articulation points into multiple partial queries; determining cursor type index access methods; executing a multiple partial queries against the SUM-Index to generate a list of applicable one or more PID values in the PS-Index and the PV-Index that satisfies partial query segments; generating a set of ancestor-descendant PID identifiers list from an associated SUM-Index tree by extracting annotated PID values to initialize a simple cursor or a multi-predicate branching path cursor (MPBP); searches the PS-Index that is a search index that is portioned on PID; generating a result sequence using the simple cursor or the MPBPC cursor from a PS-Index tree; searches the PV-Index that is a search index that is partitioned on PID; filtering the result sequence of nodes by using a PV-Index tree; and producing one or more outputted XML documents from a final result sequence of nodes. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable medium having a computer product for processing one or more inputted XPath queries against one or more XML documents, which when executed by a computing device having a plurality of hardware processors (CPU) and memory, comprises:
-
program code the execution of which generates a first index that comprises unique root to leaf paths (SUM-Index), a second index that comprises tree nodes grouped by unique path identifiers (PS-Index), and a third index that comprises values of the tree nodes grouped by path identifiers (PV-Index) from the XML document, wherein the SUM-Index and the PS-Index and the PV-Index each have at least one unique path identifier (PID) and are linked together by at least one PID originating from the SUM-Index; program code that annotates the SUM-Index with the PID; program code that distributes the SUM-Index, PS-Index, and PV-Index across a plurality of CPUs addressed by PID; program code that executes parsing and splitting of an XPath query at articulation points into multiple partial queries; program code that determines cursor type index access methods; program code that examines the SUM-Index to generate a list of applicable PID in the PS-Index and the PV-Index that satisfies partial query segments; program code that generates a set of ancestor-descendant PID identifiers list from a SUM-Index tree to initialize a simple cursor or a multi-predicate branching path cursor (MPBPC); program code that accesses a distributed SUM-Index, PS-Index and PV-Index across the plurality of CPUs addressed by PID; program code that generates a result sequence using the simple cursor or MPBPC cursor from a PS-Index tree; program code that filters the result sequence of nodes by using an associated PV-Index tree; and program code that produces one or more outputted XML documents from a final result sequences of nodes. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A method for deploying a system across a plurality of computer hardware processors (CPUs) and memory for processing one or more inputted XPath queries against one or more XML documents, comprising:
-
providing a computer infrastructure being operable to; generate an a first index that comprises unique root to leaf paths (SUM-Index), a second index that comprises tree nodes grouped by unique path identifiers (PS-Index), and a third index that comprises values of the tree nodes grouped by path identifiers (PV-Index) from an XML document, wherein the SUM-Index and the PS-Index and the PV-Index each have at least one unique path identifier (PID) and are linked together by at least one PID originating from the SUM-Index; distribute the XML document across a plurality of distributed CPUs addressed by PID; execute parsing and splitting of an XPath query at articulation points into multiple partial queries; determine cursor type index access methods; execute the partial queries against the SUM-Index to generate a list of applicable PID in the PS-Index and the PV-Index that satisfies partial query segments; generate a set of ancestor-descendant PID identifiers list from a SUM-Index tree to initialize a simple cursor or a multi-predicate branching path cursor (MPBP); access the plurality of distributed CPUs addressed by PID; generate a result sequence using the simple cursor or the MPBPC cursor from a PS-Index tree; filter the result sequence of nodes by using an associated PV-Index tree; and produce one or more outputted XML documents from a final result sequences of nodes.
-
Specification