×

Index extraction from documents

  • US 8,805,803 B2
  • Filed: 08/12/2004
  • Issued: 08/12/2014
  • Est. Priority Date: 08/12/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for index extraction, comprising the steps of:

  • storing a plurality of ground truth documents in a database, the documents being organized according to a plurality of classifications, each classification having a group of predefined indices;

    classifying a document by drawing an association in a computer system between the document to be indexed and one of the classifications;

    attempting in the computer system to extract from the document at least a subset of the group of predefined indices associated with the one of the classifications; and

    attempting in the computer system to find and correct at least one text recognition error in the document based upon a salient dictionary associated with the one of the classifications upon a failure to extract the subset of the group of predefined indices, wherein anticipated misspellings associated with each of the classifications are stored in the salient dictionary and the document is searched for anticipated misspellings of predefined indices that have not been extracted from the document.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×