Index extraction from documents
First Claim
1. A method for indexing documents, comprising the steps of:
- storing a plurality of ground truth documents in a database, the ground truth documents being organized in a plurality of classifications;
attempting to automatically extract indices from a document based upon a classification associated with the document;
reclassifying the document from a first one of the classifications to a second one of the classifications during the course of the automated extraction of the indices by drawing an association between the document and at least one of the ground truth documents;
manually extracting the indices from the document upon a failure to automatically extract the indices; and
storing the document in the database as one of the ground truth documents if the indices are manually extracted.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems, methods, and programs embodied in a computer readable medium are provided for index extraction. A plurality of ground truth documents are stored in a database, the ground truth documents being organized in a plurality of classifications. Attempts are made to automatically extract indices from a document based upon a classification associated with the document. The document is reclassified from a first one of the classifications to a second one of the classifications during the course of the automated extraction of the indices by drawing an association between the document and at least one of the ground truth documents. The indices are manually extracted from the document upon a failure to automatically extract the indices. The document is stored in the database as one of the ground truth documents if the indices are manually extracted.
101 Citations
21 Claims
-
1. A method for indexing documents, comprising the steps of:
-
storing a plurality of ground truth documents in a database, the ground truth documents being organized in a plurality of classifications;
attempting to automatically extract indices from a document based upon a classification associated with the document;
reclassifying the document from a first one of the classifications to a second one of the classifications during the course of the automated extraction of the indices by drawing an association between the document and at least one of the ground truth documents;
manually extracting the indices from the document upon a failure to automatically extract the indices; and
storing the document in the database as one of the ground truth documents if the indices are manually extracted. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A program embodied in a computer-readable medium for indexing documents, comprising:
-
a database that includes a plurality of ground truth documents, the ground truth documents being organized in a plurality of classifications;
code that attempts to automatically extract indices from a document based upon a classification associated with the document;
code that reclassifies the document from a first one of the classifications to a second one of the classifications during the course of the automated extraction of the indices by drawing an association between the document and at least one of the ground truth documents;
code that facilitates a manual extraction of the indices from the document upon a failure to automatically extract the indices; and
code that stores the document in the database as one of the ground truth documents if the indices are manually extracted. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. An system for indexing documents, comprising:
-
a server having a processor and a memory;
a database stored in the memory, the database including a plurality of ground truth documents, the ground truth documents being organized in a plurality of classifications;
a document stored in the memory, the document being classified in a first one of the classifications; and
an automated document indexing system stored in the memory and executable by the processor, the automated document indexing system comprising;
logic that attempts to automatically extract indices from the document based upon the first one of the classifications;
logic that reclassifies the document from the first one of the classifications to a second one of the classifications during the course of the automated extraction of the indices by drawing an association between the document and at least one of the ground truth documents;
logic that facilitates a manual extraction of the indices from the document upon a failure to automatically extract the indices; and
logic that stores the document in the database as one of the ground truth documents if the indices are manually extracted. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. An system for indexing documents, comprising:
-
a database stored in a memory, the database including a plurality of ground truth documents, the ground truth documents being organized in a plurality of classifications;
a document stored in the memory, the document being classified in a first one of the classifications; and
means for attempting to automatically extract indices from the document based upon the first one of the classifications;
means for reclassifying the document from the first one of the classifications to a second one of the classifications during the course of the automated extraction of the indices by drawing an association between the document and at least one of the ground truth documents;
means for facilitating a manual extraction of the indices from the document upon a failure to automatically extract the indices; and
means for storing the document in the database as one of the ground truth documents if the indices are manually extracted. - View Dependent Claims (20, 21)
-
Specification