Extracting a portion of a document, such as a web page
First Claim
1. A method caused to be performed by at least one computing system having a processor, the method comprising:
- generating a list of nodes in a rendered version of a subject web page, each node having a respective location score having a value identical to other location scores of each node in the list;
determining a location of a pointer displayed in relation to the rendered version of the subject web page according to scrolling adjusted coordinates of the displayed pointer, the location indicating a first node of the list of nodes, the first node having a first location score reduced by a factor;
for each node of the list of nodes, updating the respective location score, the updated respective location score for each node having an updated value based on a distance of the pointer from a corner associated with the node and a square root of an area of the node;
identifying, based on a lowest updated value of the updated location scores, a portion of the rendered version of the subject web page corresponding to at least one subtree of a document object model tree created for the subject web page;
establishing in the document object model tree a reset node comprising a stylesheet specifying a predetermined standardized set of formatting attribute values inheritable by descendents of the reset node;
for each subtree of the document object model tree created for the subject web page to which the identified first node of the rendered version of the subject web page corresponds;
traversing the subtree;
for each node of the subtree visited during the traversal;
establishing a corresponding node as a descendent of the reset node, the established corresponding node having a type matching a type of the node of the subtree;
where the node of the subtree has calculated values for any of a plurality of formatting attributes, for each of the plurality of formatting attributes;
determining a calculated value of the formatting attribute in the node of the subtree;
determining a calculated value of the formatting attribute in the corresponding node, the determined calculated value of the formatting attribute in the corresponding node being inherited from the predetermined standardized set of formatting attribute values;
determining that the calculated values differ; and
only when it is determined that calculated values differ, explicitly specifying for the corresponding node the determined calculated value of the formatting attribute in the node of the subtree.
3 Assignments
0 Petitions
Accused Products
Abstract
A portion data structure representing a portion extracted from a formatted source document is described. A portion data structure contains a first subtree of nodes that is modeled after a second subtree of a complete hierarchical representation of the formatted source document. Explicit formatting attribute values are specified for nodes of the first subtree only where a value calculated for the formatting attribute in a node of the first subtree differs from a value calculated for the formatting attribute in the corresponding node in the second subtree at a time when the node of the first subtree descends from a reset node specifying standardized formatting attribute values. The contents of the portion data structure are usable to display the portion extracted from the formatted source document in a context other than the formatted source document.
-
Citations
31 Claims
-
1. A method caused to be performed by at least one computing system having a processor, the method comprising:
-
generating a list of nodes in a rendered version of a subject web page, each node having a respective location score having a value identical to other location scores of each node in the list; determining a location of a pointer displayed in relation to the rendered version of the subject web page according to scrolling adjusted coordinates of the displayed pointer, the location indicating a first node of the list of nodes, the first node having a first location score reduced by a factor; for each node of the list of nodes, updating the respective location score, the updated respective location score for each node having an updated value based on a distance of the pointer from a corner associated with the node and a square root of an area of the node; identifying, based on a lowest updated value of the updated location scores, a portion of the rendered version of the subject web page corresponding to at least one subtree of a document object model tree created for the subject web page; establishing in the document object model tree a reset node comprising a stylesheet specifying a predetermined standardized set of formatting attribute values inheritable by descendents of the reset node; for each subtree of the document object model tree created for the subject web page to which the identified first node of the rendered version of the subject web page corresponds; traversing the subtree; for each node of the subtree visited during the traversal; establishing a corresponding node as a descendent of the reset node, the established corresponding node having a type matching a type of the node of the subtree; where the node of the subtree has calculated values for any of a plurality of formatting attributes, for each of the plurality of formatting attributes; determining a calculated value of the formatting attribute in the node of the subtree; determining a calculated value of the formatting attribute in the corresponding node, the determined calculated value of the formatting attribute in the corresponding node being inherited from the predetermined standardized set of formatting attribute values; determining that the calculated values differ; and only when it is determined that calculated values differ, explicitly specifying for the corresponding node the determined calculated value of the formatting attribute in the node of the subtree. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-readable storage medium having contents configured to cause a computing system to perform a method for extracting a portion of a selected hierarchical document, the selected hierarchical document comprised of nodes in an arrangement in which a node may be a descendent of another node, each node having a type, a node and all of its descendent nodes collectively constituting a subtree of the document hierarchy, the method comprising:
-
generating a list of nodes in a rendered version of a subject web page, each node having a respective location score having a value identical to other location scores of each node in the list; determining a location of a pointer displayed in relation to the rendered version of the subject web page according to scrolling adjusted coordinates of the displayed pointer, the location indicating a first node of the list of nodes, the first node having a first location score reduced by a factor; for each node of the list of nodes, updating the respective location score, the updated respective location score for each node having an updated value based on a distance of the pointer from a corner associated with the node and a square root of an area of the node; selecting, based on a lowest updated value of the updated respective locations, one of the nodes of the document hierarchy as the root of a subtree of the document hierarchy that corresponds to the portion to be extracted; establishing a reset node comprising a stylesheet specifying a predetermined standardized set of formatting attribute values inheritable by descendents of the reset node; for each of one or more of the nodes of the subtree defined by the selected node of the document hierarchy; establishing a descendent of the reset node having the same type as the node of the subtree; for each of a plurality of formatting attributes, determining calculated values of the formatting attributes in both the descendent of the reset node and the node of the subtree, the determined calculated value of the formatting attribute in the descendant of the reset node being inherited from the predetermined standardized set of formatting attribute values; for one or more of the plurality of formatting attributes, determining that the calculated value of the formatting attribute in the descendent of the reset node does not match the calculated value of the formatting attribute in the node of the subtree; and for only those formatting attributes of the plurality of formatting attributes for which the determined calculated value of the formatting attribute in the descendent of the reset node does not match the determined calculated value of the formatting attribute in the node of the subtree, explicitly specifying for the formatting attribute in the descendent of the reset node the determined calculated value of the formatting attribute in the node of the subtree. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A method for extracting a portion of a selected hierarchical document, the selected hierarchical document comprised of nodes in an arrangement in which a node may be a descendent of another node, each node having a type, a node and all of its descendent nodes collectively constituting a subtree of the document hierarchy, the method comprising:
-
generating a list of nodes in a rendered version of a subject web page, each node having a respective location score having a value identical to other location scores of each node in the list; determining a location of a pointer displayed in relation to the rendered version of the subject web page according to scrolling adjusted coordinates of the displayed pointer, the location indicating a first node of the list of nodes, the first node having a first location score reduced by a factor; for each node of the list of nodes, updating the respective location score, the updated respective location score for each node having an updated value based on a distance of the pointer from a corner associated with the node and a square root of an area of the node; selecting, based on a lowest updated value of the updated respective locations, one of the nodes of the document hierarchy as the root of a subtree of the document hierarchy that corresponds to the portion to be extracted; establishing a reset node comprising a stylesheet specifying a predetermined standardized set of formatting attribute values inheritable by descendents of the reset node; for each of one or more of the nodes of the subtree defined by the selected node of the document hierarchy; establishing a descendent of the reset node having the same type as the node of the subtree; for each of a plurality of formatting attributes, determining the calculated value of the formatting attribute in both the descendent of the reset node and the node of the subtree, the determined calculated value of the formatting attribute in the descendant of the reset node being inherited from the predetermined standardized set of formatting attribute values; for one or more of the plurality of formatting attributes, determining that the calculated value of the formatting attribute in the descendent of the reset node does not match the calculated value of the formatting attribute in the node of the subtree; and for only those formatting attributes of the plurality of formatting attributes for which the determined calculated value of the formatting attribute in the descendent of the reset node does not match the determined calculated value of the formatting attribute in the node of the subtree, explicitly specifying for the formatting attribute in the descendent of the reset node the determined calculated value of the formatting attribute in the node of the subtree. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
Specification