×

SCALABLE INDEXING FOR LAYOUT BASED DOCUMENT RETRIEVAL AND RANKING

  • US 20110022599A1
  • Filed: 09/09/2009
  • Published: 01/27/2011
  • Est. Priority Date: 07/22/2009
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for creating a set of indexes for a:

  • collection of documents according to document layout, comprising;

    providing a plurality of documents to computer memory;

    extracting layout blocks from the provided documents;

    using a computer processor, clustering the layout blocks into a plurality of layout block clusters;

    computing a representative block for each of the layout block clusters;

    generating a document index for each provided document based on the layout blocks of the document and the computed representative blocks;

    clustering the created document indexes into a plurality of document index clusters;

    generating a representative cluster index for each of the document index clusters; and

    outputting the generated document indexes, representative blocks, document index clusters, and representative cluster indexes to memory.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×