System and method for automatic document management
First Claim
Patent Images
1. A method for managing documents, comprising:
- scanning a series of pages representing a plurality of documents;
performing optical character recognition on the series of pages to produce semantic content;
communicating the semantic content of the series of pages as untagged and unseparated information;
analyzing the semantic content of the series of pages to produce;
a document structure of the untagged and unseparated information as a series of respective document portions, anda status for respective document portions based on at least a correspondence of the semantic content of the respective document portion to a plurality of statistical semantic classification features, the status being selected from the group consisting of (a) a document classification of the respective document portion, and (b) an exception to classification status of the respective document portion;
prompting to receive prompted manual classification for document portions having the exception to classification status, wherein the manual classification is applied to the document portions having the exception to classification status; and
updating the plurality of statistical semantic classification features based on the received manual classification and the semantic content of the respective document portion.
2 Assignments
0 Petitions
Accused Products
Abstract
A system for managing documents, comprising: interfaces to a user interface, proving an application programming interface, a database of document images, a remote server, configured to communicate a text representation of the document from the optical character recognition engine to the report server, and to receive from the remote server a classification of the document; and logic configured to receive commands from the user interface, and to apply the classifications received from the remote server to the document images through the interface to the database. A corresponding method is also provided.
-
Citations
20 Claims
-
1. A method for managing documents, comprising:
-
scanning a series of pages representing a plurality of documents; performing optical character recognition on the series of pages to produce semantic content; communicating the semantic content of the series of pages as untagged and unseparated information; analyzing the semantic content of the series of pages to produce; a document structure of the untagged and unseparated information as a series of respective document portions, and a status for respective document portions based on at least a correspondence of the semantic content of the respective document portion to a plurality of statistical semantic classification features, the status being selected from the group consisting of (a) a document classification of the respective document portion, and (b) an exception to classification status of the respective document portion; prompting to receive prompted manual classification for document portions having the exception to classification status, wherein the manual classification is applied to the document portions having the exception to classification status; and updating the plurality of statistical semantic classification features based on the received manual classification and the semantic content of the respective document portion. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14)
-
-
9. A system for managing documents, comprising:
-
a memory configured to store semantic content derived from a series of pages representing a plurality of documents; a communication port configured to communicate the semantic content as untagged and unseparated information; and at least one processor configured to; analyze the semantic content of the series of pages to produce; a document structure of the untagged and unseparated information as a series of respective document portions, and a status for respective document portions based on at least a correspondence of the semantic content of the respective document portion to a plurality of statistical semantic classification features, selected from the group consisting of (a) a document classification of the respective document portion, and (b) an exception to classification status of the respective document portion, generate a prompt to a user to supply a manual classification for document portions having the exception to classification status, wherein the manual classification is applied to the document portions having the exception to classification status; and update the plurality of statistical semantic classification features based on the received manual classification and the semantic content of the respective document portion. - View Dependent Claims (10, 15, 16, 17)
-
-
18. A method for managing documents, comprising:
-
storing a semantic content of a plurality documents, which is unseparated and untagged with respect to document structure, in a database, each document encompassing a series of document portions, each respective document containing semantic content derived by automated image analysis, and a status selected from the group consisting of a document portion classification, an exception to document portion classification status, and an unclassified status for the respective document; communicating the stored semantic content which is unseparated and untagged with respect to document structure to a remote server; receiving from the remote server a response comprising a separated and tagged document structure of respective document portions, and a status selectively based on a correspondence of the semantic content of a respective document portion to a plurality of classes based on statistical semantic classification features, selected from the group consisting of (a) a classification of a respective document, and (b) an exception to classification status for the respective document, the respective document portion being automatically classified as belonging to at least one class if the correspondence of semantic content of the respective document portion is high for the at least one class; and
the respective document portion having an exception to document classification status if a correspondence of the semantic content of the document portion is not high for any class. - View Dependent Claims (19, 20)
-
Specification