×

Clustering of forms from large-scale scanned-document collection

  • US 8,509,525 B1
  • Filed: 04/06/2011
  • Issued: 08/13/2013
  • Est. Priority Date: 04/06/2011
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method of identifying documents sharing a common underlying structure, comprising:

  • detecting occurrences of a plurality of predetermined image features in a plurality of document images, wherein at least one of the plurality of predetermined image features is common among instances of a form;

    indexing the plurality of document images in an image index based on the detected image features;

    building a graph of connected nodes for the plurality of document images by searching the image index;

    identifying the documents sharing the common underlying structure using the graph;

    reproducing the common underlying structure shared by the identified documents; and

    generating improved images of the identified documents by overlaying the reproduced common underlying structure on document images of the identified documents.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×