SYSTEMS AND METHODS FOR HANDLING AND DISTINGUISHING BINARIZED, BACKGROUND ARTIFACTS IN THE VICINITY OF DOCUMENT TEXT AND IMAGE FEATURES INDICATIVE OF A DOCUMENT CATEGORY
First Claim
1. In a document analysis system that receives and processes jobs from a plurality of users and that automatically recognizes and classifies job documents into document categories, so that a job may be organized according to the document categories it contains, and in which each received document is a binarized, one-bit-per-document-pixel image version of an original grayscale or color image source document, a method of enhancing the received electronic documents to improve automatic recognition and classification of the received documents, the method comprising:
- for each page of a received document, filtering the page to infer binarized-background artifacts resulting from the binarization of the original grayscale or color image source document and which reside in the vicinity of binarized text and binarized image features in the page, so that the binarized text and binarized images may be distinguished from the binarized-background artifacts and extracted from the document;
using the extracted features from the filtered document to automatically recognized and classify a document into a document category.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of enhancing electronic documents received from a plurality of users by a document analysis system for improving automatic recognition and classification of the received electronic documents, is provided. For each page of a received electronic document, the method filters the page to infer binarized-background artifacts resulting from the binarization of the original grayscale or color image source document and which reside in the vicinity of binarized text and binarized image features in the page, so that the binarized text and binarized images may be distinguished from the binarized-background artifacts and extracted from the document. The method then uses the extracted features from the filtered document to automatically recognized and classify a document into a document category.
-
Citations
1 Claim
-
1. In a document analysis system that receives and processes jobs from a plurality of users and that automatically recognizes and classifies job documents into document categories, so that a job may be organized according to the document categories it contains, and in which each received document is a binarized, one-bit-per-document-pixel image version of an original grayscale or color image source document, a method of enhancing the received electronic documents to improve automatic recognition and classification of the received documents, the method comprising:
-
for each page of a received document, filtering the page to infer binarized-background artifacts resulting from the binarization of the original grayscale or color image source document and which reside in the vicinity of binarized text and binarized image features in the page, so that the binarized text and binarized images may be distinguished from the binarized-background artifacts and extracted from the document; using the extracted features from the filtered document to automatically recognized and classify a document into a document category.
-
Specification