Matching CCITT compressed document images
First Claim
Patent Images
1. A computer-implemented method for matching a particular document image to a plurality of prospective matching document images comprising:
- receiving a compressed representation of said particular document image, said compressed representation comprising one or more pass codes indicating that a run of consecutive black or white pixels in a line of said particular document image has no corresponding run on an adjacent line;
determining locations of said one or more pass codes from said compressed representation of said particular document image;
comparing said locations of said pass codes determined from said compressed representation of said particular document image with sets of locations of pass codes previously obtained for said prospective matching document images, wherein the comparing comprises determining a distance metric based upon a distance between said locations of pass codes determined from said particular document image to each of said sets of pass codes for said prospective matching document images; and
identifying a matching document image from among said prospective matching document images based upon said comparison.
0 Assignments
0 Petitions
Accused Products
Abstract
A fast, memory efficient, and accurate document image matching system is disclosed. Document image matching is based on identifying anchor points of characters in the document. The document matching process includes a feature extraction step where anchor points, e.g., points representing approximate locations of characters, are identified as features for matching. In a particularly efficient implementation, the anchor points are “pass codes” in a two dimensionally encoded representation of a document image.
-
Citations
27 Claims
-
1. A computer-implemented method for matching a particular document image to a plurality of prospective matching document images comprising:
-
receiving a compressed representation of said particular document image, said compressed representation comprising one or more pass codes indicating that a run of consecutive black or white pixels in a line of said particular document image has no corresponding run on an adjacent line;
determining locations of said one or more pass codes from said compressed representation of said particular document image;
comparing said locations of said pass codes determined from said compressed representation of said particular document image with sets of locations of pass codes previously obtained for said prospective matching document images, wherein the comparing comprises determining a distance metric based upon a distance between said locations of pass codes determined from said particular document image to each of said sets of pass codes for said prospective matching document images; and
identifying a matching document image from among said prospective matching document images based upon said comparison. - View Dependent Claims (2, 3, 4)
-
-
5. A computer program product for matching a particular document image to a plurality of prospective matching document images comprising:
-
code for receiving a compressed representation of said particular document image, said compressed representation comprising one or more pass codes indicating that a run of consecutive black or white pixels in a line of said particular document image has no corresponding run on an adjacent line;
code for determining locations of said one or more pass codes from said compressed representation of said particular document image;
code for comparing said locations of said pass codes determined from said compressed representation of said particular document image with sets of locations of pass codes previously obtained for said prospective matching document images, wherein the comparing comprises determining a distance metric based upon a distance between said locations of pass codes determined from said particular document image to each of said sets of pass codes for said prospective matching document images;
code for identifying a matching document image from among said prospective matching document images based upon said comparison; and
a computer-readable storage medium for storing the codes. - View Dependent Claims (6, 7, 8)
-
-
9. A computer system configured to compare document images, said computer system comprising:
-
a storage device for storing an electronic representation of an image of a first document; and
a processing system configured to;
extract locations of pass codes from said electronic representation of said image of said first document, each pass code indicating a difference in runs of black or white pixels between adjacent lines; and
identify a matching document image from among a first set of document images based on finding a minimum for a distance metric computed from a distance between said pass code locations extracted from said image of said fist document and pass code locations extracted from at least one image from said first set of document images.
-
-
10. A method for matching a particular document image to at least one of a plurality of prospective matching document images comprising:
-
obtaining a set of locations of anchors of characters in a compressed presentation of said particular document image, wherein said compressed presentation comprises pass codes indicating that a run of consecutive black or white pixels in a line of said image has no corresponding run on a following line, by selecting locations of said pass codes to be said locations of anchors;
determining a distance metric based upon a distance between said set for said particular document image to each of at least one of a plurality of sets of locations of anchors for said prospective matching document images; and
identifying said best matching document image to be the prospective matching document image having a minimum of said distance metric with respect to said particular document image.
-
-
11. A method of matching a particular document image to at least one of a plurality of document images comprising:
-
receiving a representation of said particular document image, said representation comprising one or more pass codes indicating that a run of consecutive black or white pixels in a line of said image has no corresponding run on a following line;
accessing information identifying a set of pass codes for each document image in said plurality of document images;
determining a distance metric based upon a distance between said one or more pass codes included in said representation of said particular document image and a set of pass codes for each document image in said plurality of document images; and
identifying a document image from the plurality of document images having a minimum of said distance metric with respect to said particular document image as a matching document image.
-
-
12. A method for matching a particular document image to at least one of a plurality of prospective matching document images comprising:
-
obtaining a set of locations of anchors of characters in said particular document image, by selecting locations of characters within said particular document image to be said locations of anchors;
determining a distance metric based upon a distance between said set for said particular document image to each of at least one of a plurality of sets of locations of anchors for said prospective matching document images; and
identifying said best matching document image to be the prospective matching document image having a minimum of said distance metric with respect to said particular document image.
-
-
13. A computer-implemented method for matching a particular document image to a plurality of prospective matching document images comprising:
-
detecting changes in pixel values between successive pixels in said particular document image;
determining locations of said detected changes in pixel values;
capturing said determined locations of said detected changes as locations of anchor points, said anchor points indicative of features of said particular document image;
comparing a set of anchor point locations captured for said particular document image to sets of anchor points locations previously obtained for said prospective matching document images; and
identifying a best matching document image from among said prospective matching document images based on said comparing of said anchor point locations. - View Dependent Claims (14, 15, 16, 17, 18)
accepting a compressed representation of said particular document image as input, wherein said compressed representation includes pass codes, each pass code indicating that a run of consecutive black or white pixels in a line of said image has no corresponding run on an adjacent line; and
extracting locations of said pass codes from said compressed representation, wherein said pass code locations are indicative of said anchor point locations.
-
-
15. The method of claim 14 wherein said compressed representation is a CCITT Group IV facsimile representation.
-
16. The method of claim 14 wherein said compressed representation is a CCITT Group III facsimile representation.
-
17. The method of claim 13 wherein said anchor points are indicative of approximate locations of characters in said particular document image.
-
18. The method of claim 13 wherein said comparing comprises determining a distance metric based upon a distance between said set of anchor point locations for said particular document image to each of said sets of anchor point locations for said prospective matching document images;
- and
wherein said identifying comprises determining said best matching document image to be a prospective matching document image having a minimum of said distance metric with respect to said particular document image.
- and
-
19. A computer program product for matching a particular document image to at least one of a plurality of prospective matching document images comprising:
-
code for obtaining a set of locations of anchors of characters in a compressed representation of said particular document image, wherein said compressed representation comprises pass codes indication that a run of consecutive black or white pixels in a line of said image has no corresponding run on a following line, by selecting locations of said pass codes to be said locations of anchors;
code for determining a distance metric based upon a distance between said set for said particular document image to each of at least one of a plurality of sets of locations of anchors for said prospective matching document images;
code for identifying said best matching document image to be the prospective matching document image having a minimum of said distance metric with respect to said particular document image; and
a computer readable storage medium for holding the codes.
-
-
20. A computer program product for matching a particular document image to at least one of a plurality of document images comprising:
-
code for receiving a representation of said particular document image, said representation comprising one or pass codes indicating that a run of consecutive black or white pixels in a line of said image has no corresponding run on a following line;
code for accessing information identifying a set of pass codes for each document image in said plurality of document images;
code for determining a distance metric based upon a distance between said one or more pass codes included in said representation of said particular document image to a set of pass codes for each document image in said plurality of document images;
code for identifying a document image from the plurality of document images having a minimum of said distance metric with respect to said particular document image as a matching document image; and
a computer readable storage medium for holding the codes.
-
-
21. A computer program product for matching a particular document image to at least one of a plurality of prospective matching document images comprising:
-
code for obtaining a set of locations of anchors of characters in said particular document image, by selecting locations of characters within said particular document image to be said locations of anchors;
code for determining a distance metric based upon a distance between said set for said particular document image to each of at least one of a plurality of sets of locations of anchors for said prospective matching document images;
code for identifying said best matching document image to be the prospective matching document image having a minimum of said distance metric with respect to said particular document image; and
a computer readable storage medium for holding the codes.
-
-
22. A computer program product for matching a particular document image to a plurality of prospective matching document images comprising:
-
code for detecting changes in pixel values between successive pixels in said particular document image;
code for determining locations of said detected changes in pixel values;
code for capturing said determined locations of said detected changes as locations of anchor points, said anchor points indicative of features of said particular document image;
code for comparing a set of anchor point locations captured for said particular document image to sets of anchor points locations previously obtained for said prospective matching document images;
code for identifying a best matching document image from among said prospective matching document images based on said comparing of said anchor point locations; and
a computer readable storage medium for holding the codes. - View Dependent Claims (23, 24, 25, 26, 27)
code for accepting a compressed representation of said particular document image as input, wherein said compressed representation includes pass codes, each pass code indicating that a run of consecutive black or white pixels in a line of said image has no corresponding run on an adjacent line; and
code for extracting locations of said pass codes from said compressed representation, wherein said pass code locations are indicative of said anchor point locations.
-
-
24. The computer program product of claim 23 wherein said compressed representation is a CCITT Group III facsimile representation.
-
25. The computer program product of claim 23 wherein said compressed representation is a CCITT Group IV facsimile representation.
-
26. The computer program product of claim 22 wherein said anchor points are indicative of approximate locations of characters in said particular document image.
-
27. The computer program product of claim 22 wherein said comparing comprises determining a distance metric based upon a distance between said set of anchor point locations for said particular document image to each of said sets of anchor point locations for said prospective matching document images;
- and
wherein said identifying comprises determining said best matching document image to be a prospective matching document image having a minimum of said distance metric with respect to said particular document image.
- and
Specification