IDENTIFICATION OF CONTENT IN AN ELECTRONIC DOCUMENT
First Claim
Patent Images
1. A method comprising:
- receiving a web page that includes one or more sections;
identifying a section from the electronic document;
calculating, using one or more processors, a score that corresponds to the section from the electronic document, the calculating being performed based on one or more attributes of the section;
comparing the calculated score that corresponds to the section to a predetermined threshold; and
marking the section as including content based on the comparison.
1 Assignment
0 Petitions
Accused Products
Abstract
In some embodiments, a method includes receiving an electronic document that comprises a plurality of sections. The method includes marking the plurality of sections as a content section or a non-content section using a visual attribute of the sections that includes at least one of a width of the section, a density of the plurality of hyperlinks in the section, a size of a font of text in the section and whether a title of the electronic document overlaps with text in the section. The method also includes storing the marking of the plurality of sections of the electronic document in a machine-readable medium.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving a web page that includes one or more sections; identifying a section from the electronic document; calculating, using one or more processors, a score that corresponds to the section from the electronic document, the calculating being performed based on one or more attributes of the section; comparing the calculated score that corresponds to the section to a predetermined threshold; and marking the section as including content based on the comparison. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising:
-
one or more processors and executable instructions accessible on a computer-readable medium that, when executed, configure the one or more processors to at least; receive a web page that includes one or more sections; identify a section from the electronic document; calculate a score that corresponds to the section from the electronic document, the calculating being performed based on one or more attributes of the section; compare the calculated score that corresponds to the section to a predetermined threshold; and mark the section as including content based on the comparison. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A non-transitory machine-readable medium storing instructions that, when executed by one or more processors of a machine, cause the machine to perform operations comprising:
-
receiving a web page that includes one or more sections; identifying a section from the electronic document; calculating a score that corresponds to the section from the electronic document, the calculating being performed based on one or more attributes of the section; comparing the calculated score that corresponds to the section to a predetermined threshold; and marking the section as including content based on the comparison.
-
Specification