Adaptive recognition of documents using layout attributes
First Claim
1. A method for automatically recognizing documents in a document imaging system where attributes of classes of documents are stored in a computer system, said method comprising the steps of:
- receiving a qualitative selection of a region of data in a document representative of a class of documents, wherein the qualitative selection describes a distinguishing feature of the class of documents;
assigning the qualitative selection as an attribute of a class of documents;
storing the assigned qualitative attribute with an identification of the class of documents to which it belongs;
extracting regions of data from a document inputted into the document imaging system;
comparing extracted regions of data with stored qualitative attributes associated with stored classes of documents; and
responsive to the extracted regions of data matching the stored attributes of one of the classes of documents, classifying the inputted document as belonging to the class of documents whose attributes match the attributes of the inputted document.
1 Assignment
0 Petitions
Accused Products
Abstract
An attribute extracting module (256) extracts attributes from a document (50) input into the system. An attribute comparison module (270) compares extracted attributes with multiple classes (54) of documents. Upon determining that attributes of the document (50) match attributes of one of the classes (54), the document (50) is classified as belonging to the class (54) and is processed in accordance with the system actions associated with the matched class (54). In one embodiment, attributes of the input document (50) are compared to the documents (56) belonging to the matched class (54) which are already on the system. If the system determines that the input document (50) matches one of the existing images (56), the user (240) is alerted that the input document (50) already exists in the system. In a further embodiment, a match is determined in response to a comparison quality measure determined by a quality assessment module (258). The comparison quality measure measures the accuracy of the comparison. If the comparison quality measure exceeds a threshold, a match is determined to have been made. The comparison quality measure examines, among other factors, sizes, locations, and word accuracy values of matching regions within the input document (50) and the matching class (54).
124 Citations
22 Claims
-
1. A method for automatically recognizing documents in a document imaging system where attributes of classes of documents are stored in a computer system, said method comprising the steps of:
-
receiving a qualitative selection of a region of data in a document representative of a class of documents, wherein the qualitative selection describes a distinguishing feature of the class of documents;
assigning the qualitative selection as an attribute of a class of documents;
storing the assigned qualitative attribute with an identification of the class of documents to which it belongs;
extracting regions of data from a document inputted into the document imaging system;
comparing extracted regions of data with stored qualitative attributes associated with stored classes of documents; and
responsive to the extracted regions of data matching the stored attributes of one of the classes of documents, classifying the inputted document as belonging to the class of documents whose attributes match the attributes of the inputted document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
responsive to the extracted attributes matching the attributes of one of the stored documents, classifying the inputted document as matching the stored document.
-
-
4. The method of claim 3 wherein the inputted document is compared to documents on the computer system who are members of the class to which the inputted document belongs.
-
5. The method of claim 3 further comprising the step of:
responsive to classifying the inputted document as matching the stored document, displaying a message that the inputted document is duplicative of a document stored on the system.
-
6. The method of claim 1 further comprising the step of:
responsive to the document being classified as belonging to a class of documents, performing document imaging actions associated with the matching class on the inputted document.
-
7. The method of claim 6 wherein document imaging actions include storing an image of the inputted document into a predetermined file location on a disk.
-
8. The method of claim 6 wherein document imaging actions include extracting a keyword from the inputted document.
-
9. A method for automatically recognizing documents in a document imaging system where attributes of classes of documents are stored in a computer system, said method comprising the steps of:
-
extracting attributes from a document inputted into the document imaging system;
comparing extracted attributes with attributes of the stored classes of documents wherein the comparing extracted attributes step further comprises the substep of obtaining a comparison quality measure for each comparison;
and wherein the step of obtaining a comparison quality measure further comprises the substeps of;
selecting one of the classes of documents for comparison;
identifying regions in the inputted document;
identifying regions in the selected class;
determining a number of regions in the inputted document which match regions in the selected class;
determining a comparison quality measure in response to the number of regions in the inputted document which match regions in the selected class; and
repeating the selecting one of the classes of documents for comparison, identifying regions in the selected class, determining a number of regions in the inputted document which match regions in the selected class, and determining a comparison quality measure substeps until all of the classes have been compared and responsive to the extracted attributes matching the attributes of one of the classes of documents, classifying the inputted document as belonging to the class of documents whose attributes match the attributes of the inputted document and wherein the classifying step further comprises the substep of classifying the inputted document as belonging to a class of documents in response to the comparison quality measure for the comparison between the inputted document and the class of documents exceeding a threshold. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
comparing the identified regions to pre-defined regions in the selected class; and
responsive to a identified region matching one of the pre-defined regions, associating the identified region with the matching region.
-
-
11. The method of claim 10 wherein the substep of comparing the identified regions to pre-defined regions in the selected class substep further comprises the substeps of:
-
determining a size of the identified region;
determining a location of the identified region;
selecting a pre-defined region from the selected class for comparison;
comparing the size of the identified region to the size of the selected pre-defined region;
comparing the location of the identified region to the location of the selected predefined region;
responsive to the size and location of the identified region matching the pre-defined region within a user-defined tolerance, classifying the identified region as a matching region; and
repeating the selecting a pre-defined region, comparing the size of the identified region, comparing the location of the identified region, and the classifying substeps in response to the size and location of the identified region not matching the pre-defined region within a user-defined tolerance, until all of the pre-defined regions have been compared.
-
-
12. The method of claim 10 wherein the user-defined tolerance is adjusted in response to the comparison quality measure being less than the threshold.
-
13. The method of claim 9, further comprising the step of determining a total region quality measure, and wherein the comparison quality measure is determined by a logical combination of the number of regions in the inputted document which match regions in the selected class and the total region quality measure.
-
14. The method of claim 13, wherein the step of determining a total region quality measure further comprises the substeps of:
-
selecting a region in the inputted document having a matching region in the selected class;
determining a location measure for the selected region;
determining a size measure for the selected region;
determining a region quality measure from a logical combination of the location measure and the size measure;
repeating the selecting a region, determining a location measure, determining a size measure, and determining a region quality measure substeps for each region having a matching region in the selected class; and
obtaining a total region quality measure from the logical combination of the determined region quality measures.
-
-
15. The method of claim 14 wherein a word accuracy measure is determined, and the region quality measure is determined from a logical combination of the word accuracy measure, the location measure, and the size measure.
-
16. The method of claim 14 wherein the determining a location measure step further comprises the substeps of:
-
determining a location of the selected region;
determining a location of the matching region;
comparing the locations of the selected and matching regions; and
responsive to the comparison, generating a location measure.
-
-
17. The method of claim 14 wherein the determining a size measure step further comprises the substeps of:
-
determining a size of the selected region;
determining a size of the matching region;
comparing the sizes of the selected and matching regions; and
responsive to the comparison, generating a size measure.
-
-
18. The method of claim 13 further comprising the step of determining a non-region based quality measure, and wherein the comparison quality measure is determined by the logical combination of the non-region based quality measure, the total region quality measure, and the number of regions in the inputted document which match regions in the selected class of documents.
-
19. A computer apparatus comprising:
-
RAM, for temporarily storing a created digital image;
coupled to the RAM, a central processing unit, for receiving a qualitative selection of a region of data in a document representative of a class of documents, wherein the qualitative selection describes a distinguishing feature of the class of documents;
assigning the qualitative selection as an attribute of a class of documents;
storing the assigned qualitative attribute with an identification of the class of documents to which it belongs;
extracting regions of data from the digital image of the paper document, comparing extracted regions of data to stored qualitative attributes associated with classes of documents, and classifying the digital image of the paper document responsive to the comparison; and
coupled to the central processing unit, a storage device, for storing attributes of the classes of documents.
-
-
20. A computer-readable medium containing a computer program for processing documents in a document imaging system, wherein a paper copy of the document to be processed is transformed into a digital version of the document, and the computer program causes the processor to receive a qualitative selection of a region of data in a document representative of a class of documents, wherein the qualitative selection describes a distinguishing feature of the class of documents, assign the qualitative selection as an attribute of a class of documents, store the assigned qualitative attribute with an identification of the class of documents to which it belongs, extract regions of data from the digital image of the paper document, compare extracted regions of data to stored qualitative attributes associated with classes of documents, and classify the digital image of the paper document as belonging to a class of documents responsive to the comparison.
-
21. A computer-readable medium containing a computer program for processing documents in a document imaging system, wherein a paper copy of the document to be processed has been transformed into a digital version of the document, and the computer program causes the processor to perform the steps of:
-
extracting attributes from a document inputted into the document imaging system;
comparing extracted attributes with attributes of stored classes of documents, including obtaining a comparison quality measure for each comparison comprising;
selecting one of the stored classes of documents for comparison;
identifying regions in the inputted document;
identifying regions in the selected class;
determining a number of regions in the inputted document which match regions in the selected class;
determining a comparison quality measure in response to the number of regions in the inputted document which match regions in the selected class;
repeating the selecting one of the stored classes of documents for comparison, identifying regions in the inputted document, identifying regions in the selected class, determining a number of regions in the inputted document which match regions in the selected class, and determining a comparison quality measure substeps until all of the stored classes have been compared; and
responsive to the extracted attributes matching the attributes of one of the stored classes of documents, classifying the inputted document as belonging to a class of documents;
whose attributes match the attributes of the inputted document and wherein the classifying step further comprises the substep of classifying the inputted document as belonging to a class of documents in response to the comparison quality measure for the comparison between the inputted document and the class of documents exceeding a threshold.
-
-
22. A computer apparatus comprising:
-
RAM, for temporarily storing a created digital image;
an attribute extracting module, coupled to the RAM, for extracting attributes from a document inputted into the document imaging system;
a storage device, coupled to the attribute extraction module, for storing attributes of the classes of documents;
an attribute comparison module, coupled to the attribute extracting module, for selecting one of the classes of documents for comparison;
identifying regions in the inputted document;
identifying regions in the selected class; and
determining a number of regions in the inputted document which match regions in the selected class;
a quality assessment module, coupled to the attribute comparison module, for determining a comparison quality measure in response to the number of regions in the inputted document which match regions in the selected class; and
a classification module, coupled to the quality assessment module, for classifying the inputted document as belonging to a class of documents in response to the comparison quality measure for the comparison between the inputted document and the class of documents exceeding a threshold.
-
Specification