SYSTEMS AND METHODS FOR PARALLEL PROCESSING OF DOCUMENT RECOGNITION AND CLASSIFICATION USING EXTRACTED IMAGE AND TEXT FEATURES
First Claim
1. In a document analysis system that receives jobs from a plurality of users and which automatically classifies documents to organize each job according to the categories of documents the job contains, a method of parallel processing each job comprising:
- for each job, automatically separating the job into its constituent electronic documents;
for each received electronic document, automatically separating the document into subsets of electronic pages;
for each page of each subset, automatically extracting image features that are indicative of how the document is laid out or textually-organized and therefore indicative of a corresponding document category and automatically extracting text features it contains, in which feature extraction for each subset is done independently and in parallel of such automatic extraction for the other subsets of the document;
for each subset, automatically comparing the extracted features with feature sets associated with each category of document to determine a comparison score for the subset;
using the comparison score for each of the subsets to automatically classify the electronic document as being one of the categories of documents; and
organizing the job according to the categories of documents the job contains.
3 Assignments
0 Petitions
Accused Products
Abstract
A method of parallel processing jobs received from a plurality of users by a document analysis system that automatically classifies documents to organize each job, automatically separates each job into its constituent electronic document and automatically separate the document into subsets of electronic pages. For each page of each subset, the method automatically extracts image features that are indicative of how the document is laid out or textually-organized. For each subset, the method automatically compares the extracted features with feature sets associated with each document category to determine a comparison score for the subset. The method then classifies the electronic document as being one of the categories of documents using the comparison score for each of the subsets and organize the job according to the categories of documents the job contains.
-
Citations
4 Claims
-
1. In a document analysis system that receives jobs from a plurality of users and which automatically classifies documents to organize each job according to the categories of documents the job contains, a method of parallel processing each job comprising:
-
for each job, automatically separating the job into its constituent electronic documents; for each received electronic document, automatically separating the document into subsets of electronic pages; for each page of each subset, automatically extracting image features that are indicative of how the document is laid out or textually-organized and therefore indicative of a corresponding document category and automatically extracting text features it contains, in which feature extraction for each subset is done independently and in parallel of such automatic extraction for the other subsets of the document; for each subset, automatically comparing the extracted features with feature sets associated with each category of document to determine a comparison score for the subset; using the comparison score for each of the subsets to automatically classify the electronic document as being one of the categories of documents; and organizing the job according to the categories of documents the job contains. - View Dependent Claims (2, 3, 4)
-
Specification