Knowledge-based document analysis system
First Claim
1. A system for analyzing a target document including at least one informational element, the system comprising:
- (a) means for receiving a digitized image of the target document;
(b) means for extracting low level features from the digitized image;
(c) means for classifying the document based upon the extracted low level features from the digitized image to identify a most probable document type from a plurality of possible document classes that the target document most closely matches, wherein the means for classifying performs the steps of;
(i) extracting a sample immediate feature set from at least one sample document for each document class, wherein each sample immediate feature set includes at least one feature of a sample document;
(ii) Generating a sample indirect feature set for each sample document;
(iii) generating a target document immediate feature set and a target document indirect feature set, the target document immediate feature set comprising information describing a location and a type indicator for basic image features of the target document, and the target document indirect feature set comprising information summarizing attributes of the immediate features in the target document immediate feature set;
(iv) comparing the target document indirect feature set with each of the sample indirect feature sets; and
(v) classifying the target document responsive to the comparison of step (iv) to determine the most probable document type for the target document; and
(d) means for analyzing the target document in order to extract informational data associated with the at least one informational element based upon the most probable document type identified by the classifying means.
3 Assignments
0 Petitions
Accused Products
Abstract
A knowledge-based document analysis system and method for identifying and decomposing constrained and unconstrained images of documents is disclosed. Low level features are extracted within bitonal and grayscale images. Low level features are passed to a document classification means which forms initial hypotheses about the document class. For constrained documents, the document analysis system sorts through various models to determine the exact type of document and then extracts the relevant fields for character recognition. For unconstrained documents, through the use of a blackboard architecture which includes a knowledge database and knowledge sources, the document analysis means creates information and hypotheses to identify and locate relevant fields within the document. These fields are then sent for optical character recognition.
244 Citations
21 Claims
-
1. A system for analyzing a target document including at least one informational element, the system comprising:
-
(a) means for receiving a digitized image of the target document; (b) means for extracting low level features from the digitized image; (c) means for classifying the document based upon the extracted low level features from the digitized image to identify a most probable document type from a plurality of possible document classes that the target document most closely matches, wherein the means for classifying performs the steps of; (i) extracting a sample immediate feature set from at least one sample document for each document class, wherein each sample immediate feature set includes at least one feature of a sample document; (ii) Generating a sample indirect feature set for each sample document; (iii) generating a target document immediate feature set and a target document indirect feature set, the target document immediate feature set comprising information describing a location and a type indicator for basic image features of the target document, and the target document indirect feature set comprising information summarizing attributes of the immediate features in the target document immediate feature set; (iv) comparing the target document indirect feature set with each of the sample indirect feature sets; and (v) classifying the target document responsive to the comparison of step (iv) to determine the most probable document type for the target document; and (d) means for analyzing the target document in order to extract informational data associated with the at least one informational element based upon the most probable document type identified by the classifying means. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A process for analyzing a target document including at least one informational element, comprising the steps of:
-
(a) receiving a digitized image of the target document; (b) extracting low level features from the digitized image; (c) classifying the target document based upon the extracted low level features to identify a most probable document type from a plurality of possible document classes that the target document most closely matches, wherein the step of classifying comprises steps; (i) extracting a sample immediate feature set from at least one sample document for each document class, and wherein each sample immediate feature set includes at least one feature of a corresponding sample document; (ii) generating a sample indirect feature set for each sample document; (iii) generating a target document immediate feature set and a target document indirect feature set, the target document immediate feature set comprising information describing a location and type indicator for basic image features of the target document, and the target document indirect feature set comprising information summarizing attributes of the immediate features in the target document immediate feature set; (iv) comparing the target document indirect feature set with each of the sample indirect feature sets; and (v) classifying the target document responsive to the comparison of step (iv) to determine the most probable document type for the target document; and (d) analyzing the target document in order to extract informational data associated with the at least one informational element based upon the most probable document type. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
-
Specification