System and method for identifying segments in a web resource
First Claim
Patent Images
1. A method comprising:
- grouping, by a processor, items on a content page into segments in response to weightings of individual nodes associated with respective items being within a predetermined threshold difference of an average weighting for a respective segment;
merging smaller segments into larger segments based upon predefined criteria, the smaller segments and the larger segments being segments formed by the grouping of the items,wherein merging the smaller segments into the larger segments includes merging a first segment into a second segment when the second segment neighbors the first segment either above, below, to the left, or the right of the first segment, and the first segment and the second segment are either left or top aligned; and
dividing segments formed by the grouping of items that fail to meet predefined constraints into sub-segments, until the sub-segments meet the predefined constraints.
2 Assignments
0 Petitions
Accused Products
Abstract
A robust, lightweight, bottom-up segmentation method for Internet content. According to the present invention, individual segments are created based upon weights assigned according to document structure and markup elements and semantics. Smaller segments are then merged into larger segments by determining which portions of the content page are related to each other. The remaining segments are then intelligently divided based upon device constraints.
-
Citations
23 Claims
-
1. A method comprising:
-
grouping, by a processor, items on a content page into segments in response to weightings of individual nodes associated with respective items being within a predetermined threshold difference of an average weighting for a respective segment; merging smaller segments into larger segments based upon predefined criteria, the smaller segments and the larger segments being segments formed by the grouping of the items, wherein merging the smaller segments into the larger segments includes merging a first segment into a second segment when the second segment neighbors the first segment either above, below, to the left, or the right of the first segment, and the first segment and the second segment are either left or top aligned; and dividing segments formed by the grouping of items that fail to meet predefined constraints into sub-segments, until the sub-segments meet the predefined constraints. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer program product comprising a computer-readable memory, the computer-readable memory having executable computer code stored thereon, the computer code comprising:
-
computer code for grouping items on a content page into segments in response to weightings of individual nodes associated with respective items being within a predetermined threshold difference of an average weighting for a respective segment; computer code for merging smaller segments into larger segments based upon predefined criteria, the smaller segments and the larger segments being segments formed by the grouping of the items, wherein the computer code for merging the smaller segments into the larger segments includes merging a first segment into a second segment when the second segment neighbors the first segment either above, below, to the left, or the right of the first segment, and the first segment and the second segment are either left or top aligned; and computer code for dividing segments formed by the grouping of items that fail to meet predefined constraints into sub-segments, until the sub-segments meet the predefined device constraints. - View Dependent Claims (17, 18, 19)
-
-
20. An apparatus comprising at least one processor and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform:
-
grouping items on a content page into segments in response to weightings of individual nodes associated with respective items being within a predetermined threshold difference of an average weighting for a respective segment; merging smaller segments into larger segments based upon predefined criteria, the smaller segments and the larger segments being segments formed by the grouping of the items, wherein the computer code for merging the smaller segments into the larger segments includes computer code for merging a first segment into a second segment when the second segment neighbors the first segment either above, below, to the left, or the right of the first segment, and the first segment and the second segment are either left or top aligned; and dividing segments formed by the grouping of items that fail to meet predefined constraints into sub-segments until the sub- segments meet the predefined constraints. - View Dependent Claims (21, 22)
-
-
23. A system comprising:
-
a communications device configured to transmit a content page; and an electronic device in communication with the communications device the electronic device including a processor and a memory operatively connected to the processor, the memory comprising; computer code for grouping items on a content page received from the remote terminal into segments in response to weightings of individual nodes associated with respective items being within a predetermined threshold difference of an average weighting for a respective segment; computer code for merging smaller segments into larger segments based upon predefined criteria, the smaller segments and the larger segments being segments formed by the grouping of the items, wherein the computer code for merging the smaller segments into the larger segments includes computer code for merging a first segment into a second segment when the second segment neighbors the first segment either above, below, to the left, or the right of the first segment, and the first segment and the second segment are either left or top aligned; and computer code for dividing segments formed by the grouping of items that fail to meet predefined constraints into sub-segments until the sub-segments meet the predefined device constraints.
-
Specification