Technique for skipping irrelevant portions of documents during streaming XPath evaluation
First Claim
1. A method for evaluating a query for a hierarchical path of a document, the method comprising:
- determining a name of a particular node of hierarchical markup data in the document, wherein the particular node comprises one or more node values between a start of the particular node and an end of the particular node;
determining that the particular node does not fit within the hierarchical path;
accessing a summary of node locations in the document, the summary stored on a volatile or non-volatile computer-readable storage medium, the summary comprising, for each node of one or more nodes, a pair of marks comprising a start mark and another mark, the start mark specifying a first location in the document where the node starts, and the other mark specifying a second location in the document that is at or after where the node ends;
determining that the start mark of the pair of marks for the particular node is in the summary;
without evaluating at least part of the one or more node values, skipping to the second location in the document specified by the other mark of the pair of marks for the particular node in the summary;
wherein the method is performed by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus are described for summarizing a document. For each node in the document that satisfies a marking criteria, a start and end mark pair is stored in a summary in document order. The start mark specifies a location in the document where the node starts, and the end mark specifies a location in the document where the node ends. When evaluating a query for a hierarchical path, the document is streamed into memory until the mark of a tag matches a start mark in the summary. If that tag does not fit within the path, then streaming of the document may resume at the end mark, thereby skipping the node during streaming evaluation. Translation information may be used to indicate a logical position relative to the marks in the summary when the document is modified.
66 Citations
14 Claims
-
1. A method for evaluating a query for a hierarchical path of a document, the method comprising:
-
determining a name of a particular node of hierarchical markup data in the document, wherein the particular node comprises one or more node values between a start of the particular node and an end of the particular node; determining that the particular node does not fit within the hierarchical path; accessing a summary of node locations in the document, the summary stored on a volatile or non-volatile computer-readable storage medium, the summary comprising, for each node of one or more nodes, a pair of marks comprising a start mark and another mark, the start mark specifying a first location in the document where the node starts, and the other mark specifying a second location in the document that is at or after where the node ends; determining that the start mark of the pair of marks for the particular node is in the summary; without evaluating at least part of the one or more node values, skipping to the second location in the document specified by the other mark of the pair of marks for the particular node in the summary; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer-readable storage medium storing one or more sequences of instruction, wherein execution of the one or more sequences of instruction by one or more processors causes the one or more processors to perform:
-
determining a name of a particular node of hierarchical markup data in a document wherein the particular node comprises one or more node values between a start of the particular node and an end of the particular node; determining that the particular node does not fit within the hierarchical path; accessing a summary of node locations in the document, the summary stored on a volatile or non-volatile computer-readable storage medium, the summary comprising, for each node of one or more nodes, a pair of marks comprising a start mark and another mark, the start mark specifying a first location in the document where the node starts, and the other mark specifying a second location in the document that is at or after where the node ends; determining that the start mark of the pair of marks for the particular node is in the summary; without evaluating at least part of the one or more node values, skipping to the second location in the document specified by the other mark of the pair of marks for the particular node in the summary. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification