Technique For Skipping Irrelevant Portions Of Documents During Streaming XPath Evaluation
First Claim
1. A method for summarizing a document comprising nodes of hierarchical markup data, the method comprising:
- for each node of the nodes, determining, by one or more computing devices, whether the node satisfies one or more marking criteria;
for each node that satisfies the one or more marking criteria, generating, by the one or more computing devices, a pair of marks for the node, the pair comprising a start mark and an end mark for the node, the start mark specifying a first location in the document where the node starts, and the end mark specifying a second location in the document that is at or after where the node ends; and
storing the pairs of marks ordered based at least in part on the start mark.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus are described for summarizing a document. For each node in the document that satisfies a marking criteria, a start and end mark pair is stored in a summary in document order. The start mark specifies a location in the document where the node starts, and the end mark specifies a location in the document where the node ends. When evaluating a query for a hierarchical path, the document is streamed into memory until the mark of a tag matches a start mark in the summary. If that tag does not fit within the path, then streaming of the document may resume at the end mark, thereby skipping the node during streaming evaluation. Translation information may be used to indicate a logical position relative to the marks in the summary when the document is modified.
-
Citations
15 Claims
-
1. A method for summarizing a document comprising nodes of hierarchical markup data, the method comprising:
-
for each node of the nodes, determining, by one or more computing devices, whether the node satisfies one or more marking criteria; for each node that satisfies the one or more marking criteria, generating, by the one or more computing devices, a pair of marks for the node, the pair comprising a start mark and an end mark for the node, the start mark specifying a first location in the document where the node starts, and the end mark specifying a second location in the document that is at or after where the node ends; and storing the pairs of marks ordered based at least in part on the start mark. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for storing metadata for a document, the method comprising:
-
in a summary, storing marks for nodes of the document that satisfy one or more marking criteria, the marks comprising start marks and end marks, the start marks specifying first locations in the document where nodes start, and the end marks specifying second locations in the document at or after where nodes end; determining that one or more modifications have been made to a set of nodes in the document; based at least in part on the one or more modifications, storing a logical value relative to at least one of the marks stored in the summary. - View Dependent Claims (11, 12, 13)
-
-
14. One or more non-transitory computer-readable storage medium storing one or more sequences of instruction, wherein execution of the one or more sequences of instruction by one or more processors causes the one or more processors to perform:
-
reading a document comprising nodes of hierarchical markup data; for each node of the nodes, determining, by one or more computing devices, whether the node satisfies one or more marking criteria; for each node that satisfies the one or more marking criteria, generating, by the one or more computing devices, a pair of marks for the node, the pair comprising a start mark and an end mark for the node, the start mark specifying a first location in the document where the node starts, and the end mark specifying a second location in the document that is at or after where the node ends; and storing the pairs of marks ordered based at least in part on the start mark
-
-
15. One or more non-transitory computer-readable storage medium storing one or more sequences of instruction, wherein execution of the one or more sequences of instruction by one or more processors causes the one or more processors to perform:
-
in a summary, storing marks for nodes in a document that satisfy one or more marking criteria, the marks comprising start marks and end marks, the start marks specifying first locations in the document where nodes start, and the end marks specifying second locations in the document at or after where nodes end; determining that one or more modifications have been made to a set of nodes in the document; based at least in part on the one or more modifications, storing a logical value relative to at least one of the marks stored in the summary. - View Dependent Claims (9, 10)
-
Specification