×

Identification of regions of a document

  • US 8,832,549 B2
  • Filed: 06/07/2009
  • Issued: 09/09/2014
  • Est. Priority Date: 01/02/2009
  • Status: Active Grant
First Claim
Patent Images

1. A non-transitory machine readable medium storing a program which when executed by at least one processing unit analyzes a document that comprises a plurality of primitive elements, the primitive elements are present in the document prior to analysis by the program, the program comprising sets of instructions for:

  • identifying a set of potential boundaries as rectilinearly aligned graphical primitive elements that satisfy a set of size constraints;

    identifying a subset of the potential boundaries as actual boundaries by eliminating from the set (i) potential boundaries identified from the graphical primitive elements that intersect with primitive elements that are not in the set of potential boundaries and (ii) potential boundaries identified from the graphical primitive elements that do not intersect at least two additional potential boundaries;

    identifying regions bounded by the actual boundaries; and

    defining a structured document based on the regions and the primitive elements.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×