XPath query processing improvements

US 8,631,028 B1
Filed: 10/29/2010
Issued: 01/14/2014
Est. Priority Date: 10/29/2009
Status: Expired due to Fees

First Claim

Patent Images

1. An XPath query processing system for processing one or more inputted queries against one or more inputted XML documents, the system comprising at least one computer system including a plurality of distributed hardware processors (CPU) and memory, at least one distributed hardware processor executing steps that:

creates a first index that comprises unique root to leaf paths (SUM-Index) where each unique root to leaf path has an associated unique path identifier (PID); and

annotates each unique path in SUM-Index by PID;

partitions SUM-Index by PID; and

creates a second index that comprises tree nodes grouped by PID (PS-Index);

partitions PS-Index by PID; and

creates a third index that comprises values of the tree nodes grouped by PID (PV-Index); and

partitions PV-Index by PID;

wherein the first index, the second index, and the third index each have at least one path identifier and are linked together by at least one path identifier (PID); and

each unique PID forms a nexus of unique path related information across the first index, the second index, and the third index; and

each index attribute of the first index, the second index, and the third index is stored in a column store; and

decomposes one or more XPath queries into one or more partial XPath queries;

obtains a list of path identifiers from the first index to initialize a simple cursor (SC) access method or a multi-predicate-branching-path cursor (MPBPC) method from the one or more partial XPath queries;

selects the second index to generate a result sequence of nodes using a simple cursor or a MPBPC cursor;

selects the third index to lookup values to filter the result sequence of nodes; and

produces one or more outputted XML documents from the result sequence of nodes.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for processing an inputted XPath query against an inputted XML document is provided. The method generates a summary index, an ancestor-descendant path index, and a value index from one or more inputted XML documents. The summary index and the ancestor-descendant path index and the value index have at least one path defined. XPath queries at articulation points are parsed into multiple partial queries. The cursor type index access methods are determined. Partial queries are executed against the SUM-Index to generate a list of path identifiers (PID) satisfies the partial query segments. A set of ancestor-descendant PID identifiers list is generated to provide a result sequence. The result sequence of nodes is filtered producing one or more outputted XML documents from the final result sequence of nodes. A related system and computer medium is also provided.

Citations

22 Claims

1. An XPath query processing system for processing one or more inputted queries against one or more inputted XML documents, the system comprising at least one computer system including a plurality of distributed hardware processors (CPU) and memory, at least one distributed hardware processor executing steps that:
- creates a first index that comprises unique root to leaf paths (SUM-Index) where each unique root to leaf path has an associated unique path identifier (PID); and
  
  annotates each unique path in SUM-Index by PID;
  
  partitions SUM-Index by PID; and
  
  creates a second index that comprises tree nodes grouped by PID (PS-Index);
  
  partitions PS-Index by PID; and
  
  creates a third index that comprises values of the tree nodes grouped by PID (PV-Index); and
  
  partitions PV-Index by PID;
  
  wherein the first index, the second index, and the third index each have at least one path identifier and are linked together by at least one path identifier (PID); and
  
  each unique PID forms a nexus of unique path related information across the first index, the second index, and the third index; and
  
  each index attribute of the first index, the second index, and the third index is stored in a column store; and
  
  decomposes one or more XPath queries into one or more partial XPath queries;
  
  obtains a list of path identifiers from the first index to initialize a simple cursor (SC) access method or a multi-predicate-branching-path cursor (MPBPC) method from the one or more partial XPath queries;
  
  selects the second index to generate a result sequence of nodes using a simple cursor or a MPBPC cursor;
  
  selects the third index to lookup values to filter the result sequence of nodes; and
  
  produces one or more outputted XML documents from the result sequence of nodes.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The XPath query processing system of claim 1, wherein the PS-Index is a search index that is partitioned on PID, or has a composite key of PID and preorder.
  - 3. The XPath query processing system of claim 1, wherein the PV-Index is a search index that is partitioned on PID, or has a composite key of PID and preorder.
  - 4. The XPath query processing system of claim 1, wherein the PV-Index is a search index that is partitioned on PID and is multilevel subordinate to the PS-Index linked by PID.
  - 5. The XPath query processing system of claim 1, wherein the SUM-Index, the PS-Index index, and the PV-Index are constructed from schema-defined or schema-less XML documents.
  - 6. The XPath query processing system of claim 1, wherein the SUM-Index, the PS-Index index, and the PV-Index are processed with a simple cursor.
  - 7. The XPath query processing system of claim 1, wherein the SUM-Index, the PS-Index index, and the PV-Index are processed with a multi-predicate branching path cursor, MPBPC, which uses a holistic tree pattern of multilevel nested recursive cursors to simultaneously join PS-index partitions in one pass of the PS Index.

8. A computer implement method for processing one or more inputted XPath queries against one or more inputted XML documents stored in a plurality of computer hardware processors (CPU) and memory, comprising:
- loading an XML document into computer memory;
  
  generating a first index that comprises unique root to leaf paths (SUM-Index), a second index that comprises tree nodes grouped by unique path identifiers (PS-Index), and a third index that comprises values of the tree nodes grouped by path identifiers (PV-Index) from the XML document, wherein the SUM-Index, the PS-Index and the PV-Index each have at least one root to leaf unique path identifier (PID) and are linked together by at least one PID originating from the SUM-Index;
  
  annotating the SUM-Index with PID for each unique root to leaf path;
  
  stores the SUM-Index, PS-Index, and PV-Index on column stores distributed across a plurality of CPUs partitioned by PID;
  
  parsing and splitting an XPath query at articulation points into multiple partial queries;
  
  determining cursor type index access methods;
  
  executing a multiple partial queries against the SUM-Index to generate a list of applicable one or more PID values in the PS-Index and the PV-Index that satisfies partial query segments;
  
  generating a set of ancestor-descendant PID identifiers list from an associated SUM-Index tree by extracting annotated PID values to initialize a simple cursor or a multi-predicate branching path cursor (MPBP);
  
  searches the PS-Index that is a search index that is portioned on PID;
  
  generating a result sequence using the simple cursor or the MPBPC cursor from a PS-Index tree;
  
  searches the PV-Index that is a search index that is partitioned on PID;
  
  filtering the result sequence of nodes by using a PV-Index tree; and
  
  producing one or more outputted XML documents from a final result sequence of nodes.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method of claim 8, wherein the executing query searches the PS-Index that is a search index that is partitioned on PID, or has a composite key of PID and preorder.
  - 10. The method of claim 8, wherein the executing query searches the PV-Index that is a search index that is partitioned on PID, or has a composite key of PID and preorder.
  - 11. The method of claim 8, wherein the executing query searches the PV-Index that is a search index that is partitioned on PID and is multilevel subordinate to the PS-Index linked by PID.
  - 12. The method of claim 8, wherein the executing query can be run against a schema-defined or schema-less XML document.
  - 13. The method of claim 8, wherein the executing query is processed with the simple cursor.
  - 14. The method of claim 8, wherein the executing query is processed with the MPBPC cursor which uses a holistic tree pattern of multilevel nested recursive cursors to simultaneously join PS-Index partitions of the PS-Index.

15. A non-transitory computer readable medium having a computer product for processing one or more inputted XPath queries against one or more XML documents, which when executed by a computing device having a plurality of hardware processors (CPU) and memory, comprises:
- program code the execution of which generates a first index that comprises unique root to leaf paths (SUM-Index), a second index that comprises tree nodes grouped by unique path identifiers (PS-Index), and a third index that comprises values of the tree nodes grouped by path identifiers (PV-Index) from the XML document, wherein the SUM-Index and the PS-Index and the PV-Index each have at least one unique path identifier (PID) and are linked together by at least one PID originating from the SUM-Index;
  
  program code that annotates the SUM-Index with the PID;
  
  program code that distributes the SUM-Index, PS-Index, and PV-Index across a plurality of CPUs addressed by PID;
  
  program code that executes parsing and splitting of an XPath query at articulation points into multiple partial queries;
  
  program code that determines cursor type index access methods;
  
  program code that examines the SUM-Index to generate a list of applicable PID in the PS-Index and the PV-Index that satisfies partial query segments;
  
  program code that generates a set of ancestor-descendant PID identifiers list from a SUM-Index tree to initialize a simple cursor or a multi-predicate branching path cursor (MPBPC);
  
  program code that accesses a distributed SUM-Index, PS-Index and PV-Index across the plurality of CPUs addressed by PID;
  
  program code that generates a result sequence using the simple cursor or MPBPC cursor from a PS-Index tree;
  
  program code that filters the result sequence of nodes by using an associated PV-Index tree; and
  
  program code that produces one or more outputted XML documents from a final result sequences of nodes.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The computer readable medium of claim 15, wherein the program code that executes the query searches the PS-Index that is a search index that is partitioned on PID, or has a composite key of PID and preorder.
  - 17. The computer readable medium of claim 15, wherein the program code that executes the query searches the PV-Index that is a search index that is partitioned on PID, or has a composite key of PID and preorder.
  - 18. The computer readable medium of claim 15, wherein the program code that executes the query searches the PV-Index that is a search index that is partitioned on PID and is multilevel subordinate to the PS-Index linked by PID.
  - 19. The computer readable medium of claim 15, wherein the program code that executes the query is executed against a schema-defined or schema-less XML document.
  - 20. The computer readable medium of claim 15, wherein the program code that executes the query uses the simple cursor.
  - 21. The computer readable medium of claim 15, wherein the program code that executes the query uses the multi-predicate branching path cursor, MPBPC, which uses a holistic tree pattern of multilevel nested recursive cursors to simultaneously join PS-Index partitions of the PS-Index.

22. A method for deploying a system across a plurality of computer hardware processors (CPUs) and memory for processing one or more inputted XPath queries against one or more XML documents, comprising:
- providing a computer infrastructure being operable to;
  
  generate an a first index that comprises unique root to leaf paths (SUM-Index), a second index that comprises tree nodes grouped by unique path identifiers (PS-Index), and a third index that comprises values of the tree nodes grouped by path identifiers (PV-Index) from an XML document, wherein the SUM-Index and the PS-Index and the PV-Index each have at least one unique path identifier (PID) and are linked together by at least one PID originating from the SUM-Index;
  
  distribute the XML document across a plurality of distributed CPUs addressed by PID;
  
  execute parsing and splitting of an XPath query at articulation points into multiple partial queries;
  
  determine cursor type index access methods;
  
  execute the partial queries against the SUM-Index to generate a list of applicable PID in the PS-Index and the PV-Index that satisfies partial query segments;
  
  generate a set of ancestor-descendant PID identifiers list from a SUM-Index tree to initialize a simple cursor or a multi-predicate branching path cursor (MPBP);
  
  access the plurality of distributed CPUs addressed by PID;
  
  generate a result sequence using the simple cursor or the MPBPC cursor from a PS-Index tree;
  
  filter the result sequence of nodes by using an associated PV-Index tree; and
  
  produce one or more outputted XML documents from a final result sequences of nodes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Primo M. Pettovello
Original Assignee
Primo M. Pettovello
Inventors
Pettovello, Primo M.
Primary Examiner(s)
Beausoliel, Jr., Robert
Assistant Examiner(s)
Khakhar, Nirav K

Application Number

US12/915,791
Time in Patent Office

1,173 Days
Field of Search

None
US Class Current

707/765
CPC Class Codes

G06F 16/8373 Query execution

XPath query processing improvements

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

XPath query processing improvements

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links