×

Identification of content in an electronic document

  • US 8,301,998 B2
  • Filed: 12/14/2007
  • Issued: 10/30/2012
  • Est. Priority Date: 12/14/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • receiving an electronic document that comprises a plurality of sections;

    marking the plurality of sections as a content section or a non-content section using an attribute of the sections that includes at least one of a width of the section, a density of the plurality of hyperlinks in the section, a size of a font of text in the section, and whether a title of the electronic document overlaps with text in the section;

    comparing a value of a different attribute of two adjacent sections of the plurality of sections, a first section of the two adjacent sections being marked to include content and a second section of the two adjacent sections being marked not to include content;

    changing a mark of the second section from a mark not to include content to a mark to include content in response to the comparison resulting in a determination that the value of the different attribute of the first section is the same as the value of the different attribute of the second section; and

    storing the mark of the plurality of sections of the electronic document in a machine-readable medium.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×