×

TEXT SEGMENTATION OF A DOCUMENT

  • US 20120102388A1
  • Filed: 09/07/2011
  • Published: 04/26/2012
  • Est. Priority Date: 10/26/2010
  • Status: Abandoned Application
First Claim
Patent Images

1. A system to segment text from a portable document format (PDF) document, the system comprising:

  • memory for storing computer executable instructions; and

    a processing unit for accessing the memory and executing the computer executable instructions, the computer executable instructions comprising;

    an engine to group line segments into text blocks using a homogeneity measure based on relative line space difference between line segments and a homogeneity measure based on difference in font size between line segments, wherein the line segments comprise text elements extracted from the PDF document.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×