Producing web page content
First Claim
Patent Images
1. A computer readable medium having processor executable instructions stored thereon, the instructions when executed cause the implementation of a method for producing web page content, the method comprising:
- identifying blocks within a web page;
selectively assembling the blocks into sections, by;
identifying one or more frames within the web page, each frame encompassing one or more of the blocks; and
grouping the blocks into sections such that each section just includes blocks that are not separated by a frame;
selectively assembling the sections into article candidates;
distinguishing an article candidate that includes article content from article candidates that do not include article content; and
producing content just from the article candidate distinguished as including article content.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for producing web page content includes identifying blocks within a web page. The blocks are selectively assembled into sections. The sections are selectively assembled into article candidates. An article candidate that includes article content is distinguished from article candidates that do not include article content. Content is produced only from the article candidate distinguished as including article content.
28 Citations
17 Claims
-
1. A computer readable medium having processor executable instructions stored thereon, the instructions when executed cause the implementation of a method for producing web page content, the method comprising:
-
identifying blocks within a web page; selectively assembling the blocks into sections, by; identifying one or more frames within the web page, each frame encompassing one or more of the blocks; and grouping the blocks into sections such that each section just includes blocks that are not separated by a frame; selectively assembling the sections into article candidates; distinguishing an article candidate that includes article content from article candidates that do not include article content; and producing content just from the article candidate distinguished as including article content. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for producing web page content, the system comprising:
-
a block engine to identify blocks within a web page; a assembly engine to selectively assemble the blocks into sections and to selectively assemble the sections into article candidates; an article engine to distinguish an article candidate that includes article content from article candidates that do not include article content, by identifying an article candidate that includes an article body, the identified article candidate being an article candidate that occupies a larger area near a top and center of the web page than other article candidates; and a production engine to produce content just from the article candidate distinguished as including article content. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A method for producing web page content, the method comprising:
-
identifying blocks within a web page; selectively assembling the blocks into sections; selectively assembling the sections into article candidates, by following a top to bottom order, combining a section with a previous section in the order upon a positive determination of one or more the following; the section is a text section with a character count that exceeds a minimum threshold and has a left or right margin that is aligned with the left or right margin of the previous section; the section is a text section sharing visual attributes of the previous section; and the section is an image section having a size exceeding a threshold size and shares at least a partial horizontal overlap with the previous section; distinguishing an article candidate that include article content from article candidates that do not include article content; and producing content just from the article candidate distinguished as including article content. - View Dependent Claims (16, 17)
-
Specification