×

Automatic table detection method and system

  • US 6,757,870 B1
  • Filed: 03/22/2000
  • Issued: 06/29/2004
  • Est. Priority Date: 03/22/2000
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method of identifying table data in a document comprising the steps of:

  • a) receiving a page description language representation of the document for providing a list of words in the document and position information for the words; and

    b) automatically identifying table data in the document based on the page description language representation of the document and at least one table identifying feature, wherein the step of identifying includes the steps of, b1) automatically determining a table bounding box for each table in the document, wherein the table bounding box includes a top edge and a bottom edge;

    b2) expanding each table bounding box based on a text density feature, wherein the expanding step includes the steps of, b2

    1) for each line determining a text density measure;

    b2

    2) for each line determining a change of text density between the current line and the previous line;

    h2

    3) if the change in text density reaches a predetermined threshold, marking the current line with a text density tag;

    b2

    4) expanding the top edge of the table bounding box in a first direction to one of a line previously marked by a text density tag and a line with a single word cluster; and

    b2

    5) expanding the bottom edge of the table bounding box in a second direction to one of a line previously marked by a text density tag and a line with a single word cluster; and

    b3) converting the table data encompassed by each table bounding box to a markup language representation.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×