SYSTEMS AND METHODS FOR CLASSIFYING ELECTRONIC DOCUMENTS BY EXTRACTING AND RECOGNIZING TEXT AND IMAGE FEATURES INDICATIVE OF DOCUMENT CATEGORIES
First Claim
1. In a document analysis system that receives and processes jobs, a method of automatically recognizing and classifying each document in a job into a corresponding document category by automatically recognizing image and text features in the document so that each job may be automatically organized according to the categories of documents it contains, the method comprising:
- automatically extracting from each received document image and text features, in which the image features are indicative of how the document is laid out or textually-organized and therefore indicative of a corresponding document category, and the text features are distinctive words that are indicative of a corresponding document category;
comparing the extracted image and text features with feature sets associated with each category of document, in which each feature set includes a subset of text features and corresponding weights and a subset of image features and corresponding weights;
classifying each document to a document category, the feature set of which best matches the extracted features of said document; and
organizing each job according to the categories of documents it contains.
3 Assignments
0 Petitions
Accused Products
Abstract
A method in a document analysis system automatically extracts from each received electronic document image and text features, in which the image features are indicative of how the document is laid out or textually-organized and therefore indicative of a corresponding document category, next compares the extracted image and text features with feature sets associated with each document category, and then classifies each document to a document category, the feature set of which best matches the extracted features of the document.
66 Citations
1 Claim
-
1. In a document analysis system that receives and processes jobs, a method of automatically recognizing and classifying each document in a job into a corresponding document category by automatically recognizing image and text features in the document so that each job may be automatically organized according to the categories of documents it contains, the method comprising:
-
automatically extracting from each received document image and text features, in which the image features are indicative of how the document is laid out or textually-organized and therefore indicative of a corresponding document category, and the text features are distinctive words that are indicative of a corresponding document category; comparing the extracted image and text features with feature sets associated with each category of document, in which each feature set includes a subset of text features and corresponding weights and a subset of image features and corresponding weights; classifying each document to a document category, the feature set of which best matches the extracted features of said document; and organizing each job according to the categories of documents it contains.
-
Specification