Method of detection document alteration by comparing characters using shape features of characters
First Claim
1. A method implemented in a data processing apparatus for detecting alterations between an original image and a target image, the original and target images being binary bitmap images, the method comprising:
- (a) defining a plurality of bounding boxes in the original image and the target image, each bounding box enclosing one or more connected groups of pixels of one character;
(b) identifying a plurality of matching pairs of bounding boxes in the original image and the target image, wherein each matching pair of the bounding boxes have substantially the same locations in the original image and the target image, respectively;
(c) for each matching pair of bounding boxes;
(c1) calculating a plurality of shape features including (1) a Euler number of each of the pair of characters enclosed by the pair of bounding boxes, and (2) a Hausdorff distance between the pair of characters; and
(c2) determining whether the pair of characters enclosed by the pair of bounding boxes are the same character or different characters by evaluating (1) whether the Euler numbers of the pair of characters are equal, and (2) whether the Hausdorff distance between the pair of characters is smaller than a first predefined threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
A document alteration detection method compares a target image with an original image by comparing character shape features without actually recognizing the characters. Bounding boxes for the characters are generated for both images, each enclosing one or more connected groups of pixels of one character. The bounding boxes in the original and target images are matched into pairs. Addition and deletion of text is detected if a bounding box in one image does not have a matching one in the other image. Each pair of bounding boxes is processed to compare their shape features. The shape features include the Euler numbers of the characters, the aspect ratio of the bounding boxes, the pixel density of the bounding boxes, and the Hausdorff distance between the two characters. The two characters are determined to be the same or different based on the shape feature comparisons.
38 Citations
38 Claims
-
1. A method implemented in a data processing apparatus for detecting alterations between an original image and a target image, the original and target images being binary bitmap images, the method comprising:
-
(a) defining a plurality of bounding boxes in the original image and the target image, each bounding box enclosing one or more connected groups of pixels of one character; (b) identifying a plurality of matching pairs of bounding boxes in the original image and the target image, wherein each matching pair of the bounding boxes have substantially the same locations in the original image and the target image, respectively; (c) for each matching pair of bounding boxes; (c1) calculating a plurality of shape features including (1) a Euler number of each of the pair of characters enclosed by the pair of bounding boxes, and (2) a Hausdorff distance between the pair of characters; and (c2) determining whether the pair of characters enclosed by the pair of bounding boxes are the same character or different characters by evaluating (1) whether the Euler numbers of the pair of characters are equal, and (2) whether the Hausdorff distance between the pair of characters is smaller than a first predefined threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer program product comprising a computer usable non-transitory medium having a computer readable program code embedded therein for controlling a data processing apparatus, the computer readable program code configured to cause the data processing apparatus to execute a process for detecting alterations between an original image and a target image, the original and target images being binary bitmap images, wherein the process comprises:
-
(a) defining a plurality of bounding boxes in the original image and the target image, each bounding box enclosing one or more connected groups of pixels of one character; (b) identifying a plurality of matching pairs of bounding boxes in the original image and the target image, wherein each matching pair of the bounding boxes have substantially the same locations in the original image and the target image, respectively; (c) for each matching pair of bounding boxes; (c1) calculating a plurality of shape features including (1) a Euler number of each of the pair of characters enclosed by the pair of bounding boxes, and (2) a Hausdorff distance between the pair of characters; and (c2) determining whether the pair of characters enclosed by the pair of bounding boxes are the same character or different characters by evaluating (1) whether the Euler numbers of the pair of characters are equal, and (2) whether the Hausdorff distance between the pair of characters is smaller than a first predefined threshold. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method implemented in a data processing apparatus for comparing two characters in an original image and a target image, respectively, the original and target images being binary bitmap images, each character being one or more connected groups of pixels enclosed in a respective bounding box, the method comprising:
-
calculating a plurality of shape features including (1) a Euler number of each of the pair of characters enclosed by the pair of bounding boxes, and (2) a Hausdorff distance between the pair of characters; and determining whether the pair of characters enclosed by the pair of bounding boxes are the same character or different characters by evaluating (1) whether the Euler numbers of the pair of characters are equal, and (2) whether the Hausdorff distance between the pair of characters is smaller than a first predefined threshold. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30)
-
-
31. A computer program product comprising a computer usable non-transitory medium having a computer readable program code embedded therein for controlling a data processing apparatus, the computer readable program code configured to cause the data processing apparatus to execute a process for comparing two characters in an original image and a target image, respectively, the original and target images being binary bitmap images, each character being one or more connected groups of pixels enclosed in a respective bounding box, wherein the process comprises:
-
calculating a plurality of shape features including (1) a Euler number of each of the pair of characters enclosed by the pair of bounding boxes, and (2) a Hausdorff distance between the pair of characters; and determining whether the pair of characters enclosed by the pair of bounding boxes are the same character or different characters by evaluating (1) whether the Euler numbers of the pair of characters are equal, and (2) whether the Hausdorff distance between the pair of characters is smaller than a first predefined threshold. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38)
-
Specification