AUTOMATIC VISUAL SEGMENTATION OF WEBPAGES
First Claim
1. A method to divide a portion of a webpage into semantic units, comprising computer-executed steps of:
- based on visual aspects of said portion of said webpage, estimating an optimal number of semantic units that said portion of said webpage may be divided into;
based, at least in part on said optimal number, dividing said portion of said webpage into semantic units; and
storing into volatile or non-volatile memory data that indicates the semantic units into which said portion of said webpage is divided.
9 Assignments
0 Petitions
Accused Products
Abstract
To provide valuable information regarding a webpage, the webpage must be divided into distinct semantically coherent segments for analysis. A set of heuristics allow a segmentation algorithm to identify an optimal number of segments for a given webpage or any portion thereof more accurately. A first heuristic estimates the optimal number of segments for any given webpage or portion thereof. A second heuristic coalesces segments where the number of segments identified far exceeds the optimal number recommended. A third heuristic coalesces segments corresponding to a portion of a webpage with much unused whitespace and little content. A fourth heuristic coalesces segments of nodes that have a recommended number of segments below a certain threshold into segments of other nodes. A fifth heuristic recursively analyzes and splits segments that correspond to webpage portions surpassing a certain threshold portion size.
95 Citations
50 Claims
-
1. A method to divide a portion of a webpage into semantic units, comprising computer-executed steps of:
-
based on visual aspects of said portion of said webpage, estimating an optimal number of semantic units that said portion of said webpage may be divided into; based, at least in part on said optimal number, dividing said portion of said webpage into semantic units; and storing into volatile or non-volatile memory data that indicates the semantic units into which said portion of said webpage is divided. - View Dependent Claims (2, 3, 4, 5, 6, 7, 26, 27, 28, 29, 30, 31, 32)
-
-
8. A method to divide a webpage into semantic units, comprising computer-executed steps of:
-
determining a first number that represents how many semantic units are currently associated with a subtree that extends from a first node of a DOM tree corresponding to said webpage; estimating a second number that represents how many semantic units should be associated with said subtree; performing a comparison between the first number and the second number; and in response to the comparison, coalescing into a single semantic unit two or more semantic units currently associated with said subtree. - View Dependent Claims (9, 10, 33, 34, 35)
-
-
11. A method to divide a webpage into semantic units, comprising computer-executed steps of:
-
estimating a size of a rendered area on said webpage corresponding to a node of a DOM tree corresponding to said webpage; analyzing rendered contents of said rendered area on said webpage; and based at least in part on said size and said rendered contents, coalescing into a single semantic unit, (a) a semantic unit currently associated with said node, and (b) one or more semantic units. - View Dependent Claims (12, 13, 14, 36, 37, 38, 39)
-
-
15. A method to divide a webpage into semantic units, comprising computer-executed steps of:
-
estimating a number that represents how many semantic units should be associated with a node of a DOM tree corresponding to said webpage; deciding that said number is below a threshold number; and in response to deciding, merging a first semantic unit associated with said node with a second semantic unit associated with a second node into a single semantic unit. - View Dependent Claims (16, 17, 18, 19, 40, 41, 42, 43, 44)
-
-
20. A method to divide a webpage into semantic units, comprising computer-executed steps of:
-
estimating a size of a rendered area on said webpage, wherein said rendered area corresponds to a semantic unit; determining that said size exceeds a threshold size; and in response to determining that said size exceeds a threshold size, dividing said semantic unit into a plurality of semantic units. - View Dependent Claims (21, 22, 23, 24, 25, 45, 46, 47, 48, 49, 50)
-
Specification