Identification of content in an electronic document
First Claim
Patent Images
1. A method comprising:
- receiving an electronic document that comprises a plurality of sections;
marking individual sections in the plurality of sections as a content section or a non-content section using one of a plurality of visual attributes of the plurality of sections;
comparing, using one or more processors, a value of a different visual attribute of two adjacent sections of the plurality of sections, a first section of the two adjacent sections being marked to include content and a second section of the two adjacent sections being marked not to include content; and
changing the mark of the second section from a mark not to include content to a mark to include content in response to a determination that the value of the different visual attribute of the first section is the same as the value of the different visual attribute of the second section.
1 Assignment
0 Petitions
Accused Products
Abstract
In some embodiments, a method includes receiving an electronic document that comprises a plurality of sections. The method includes marking the plurality of sections as a content section or a non-content section using a visual attribute of the sections that includes at least one of a width of the section, a density of the plurality of hyperlinks in the section, a size of a font of text in the section and whether a title of the electronic document overlaps with text in the section. The method also includes storing the marking other plurality of sections of the electronic document in a machine-readable medium.
-
Citations
22 Claims
-
1. A method comprising:
-
receiving an electronic document that comprises a plurality of sections; marking individual sections in the plurality of sections as a content section or a non-content section using one of a plurality of visual attributes of the plurality of sections; comparing, using one or more processors, a value of a different visual attribute of two adjacent sections of the plurality of sections, a first section of the two adjacent sections being marked to include content and a second section of the two adjacent sections being marked not to include content; and changing the mark of the second section from a mark not to include content to a mark to include content in response to a determination that the value of the different visual attribute of the first section is the same as the value of the different visual attribute of the second section. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory machine-readable medium including instructions which, when executed by a machine, causes the machine to perform operations comprising:
-
receiving an electronic document that comprises a plurality of blocks; marking the plurality of blocks as a content block or a non-content block using one of a plurality of visual attributes of the plurality of blocks; comparing a value of a different visual attribute of two adjacent blocks of the plurality of blocks, a first block of the two adjacent blocks being marked to include content and a second block of the two adjacent blocks being marked not to include content; and changing the mark of the second block from a mark not to include content to a mark to include content in response to a determination that the value of the different visual attribute of the first bock is the same as the value of the different visual attribute of the second block. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
a hardware implemented module to mark a plurality of sections of an electronic document as a content section or a non-content section using one of a plurality of visual attributes of the plurality of sections; at least one comparison component to compare a value of a different visual attribute of two adjacent sections of the plurality of sections, a first section of the two adjacent sections being marked to include content and a second section of the two adjacent sections being marked not to include content; at least one hardware switch to change the mark of the second section from a mark not to include content to a mark to include content in response to a determination that the value of the different visual attribute of the first section is the same as the value of the different visual attribute of the second section; and a search engine to parse the sections having content to index for a subsequent search. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
Specification