Scalable indexing for layout based document retrieval and ranking

  • US 7,953,679 B2
  • Filed: 09/09/2009
  • Issued: 05/31/2011
  • Est. Priority Date: 07/22/2009
  • Status: Expired due to Fees
  • ×
    • Pin Icon | RPX Insight
    • Pin
First Claim
Patent Images

1. A computer-implemented method for creating a set of indexes for a collection of documents according to document layout, comprising:

  • providing a plurality of documents to computer memory;

    extracting layout blocks from the provided documents;

    using a computer processor, clustering the layout blocks into a plurality of layout block clusters;

    computing a representative block for each of the layout block clusters;

    generating a document index for each provided document based on the layout blocks of the document and the computed representative blocks;

    clustering the created document indexes into a plurality of document index clusters;

    generating a representative cluster index for each of the document index clusters; and

    outputting the generated document indexes, representative blocks, document index clusters, and representative cluster indexes to memory.

View all claims

    Thank you for your feedback