Method of distinguishing handwritten and machine-printed images
First Claim
1. A method of categorizing an image as handwritten, machine-printed, and unknown,comprising the steps of:
- (a) receiving an image;
(b) identifying connected components within the image;
(c) enclosing each connected component within a bounding box;
(d) computing a height and a width of each bounding box;
(e) computing a sum and maximum horizontal run for each connected component, where the sum is the sum of all pixels in the corresponding connected component, and where the maximum horizontal run is the longest consecutive number of horizontal pixels in the corresponding connected component;
(f) identifying connected components that are suspected of being characters;
(g) if the number of suspected characters is less than or equal to a first user-definable number then categorizing the image as unknown and stopping, otherwise, proceeding to the next step;
(h) if the number of suspected characters is greater than the first user-definable number then comparing the suspected characters to determine if matches exist, where a match exists between a pair of suspected characters if the suspected characters in the pair have the same height and width, if each suspected character in the pair has a height that is less than 4 times its width, and if each suspected character in the pair has a width that is less than 4 times its height; and
(i) computing a score based on the suspected characters and the number of matches and categorizing the image into one of a group of categories consisting of handwritten, machine-printed, and unknown.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention is a method of categorizing an image as handwritten, machine-printed, and unknown. First, the image is received. Next, connected components are identified. Next, a bounding box encloses each connected component. Next, a height and width is computed for each bounding box. Next, a sum and maximum horizontal run for each connected component are computed. Next, connected components that are suspected of being characters are identified. If the number of suspected characters is less than or equal to a first user-definable number then the image is categorized as unknown. If the number of suspected characters is greater than the first user-definable number then determine if matches exist amongst the suspected characters. Next, compute a score based on the suspected characters and the number of matches and categorize the image as either handwritten, machine-printed, or unknown.
35 Citations
9 Claims
-
1. A method of categorizing an image as handwritten, machine-printed, and unknown,
comprising the steps of: -
(a) receiving an image; (b) identifying connected components within the image; (c) enclosing each connected component within a bounding box; (d) computing a height and a width of each bounding box; (e) computing a sum and maximum horizontal run for each connected component, where the sum is the sum of all pixels in the corresponding connected component, and where the maximum horizontal run is the longest consecutive number of horizontal pixels in the corresponding connected component; (f) identifying connected components that are suspected of being characters; (g) if the number of suspected characters is less than or equal to a first user-definable number then categorizing the image as unknown and stopping, otherwise, proceeding to the next step; (h) if the number of suspected characters is greater than the first user-definable number then comparing the suspected characters to determine if matches exist, where a match exists between a pair of suspected characters if the suspected characters in the pair have the same height and width, if each suspected character in the pair has a height that is less than 4 times its width, and if each suspected character in the pair has a width that is less than 4 times its height; and (i) computing a score based on the suspected characters and the number of matches and categorizing the image into one of a group of categories consisting of handwritten, machine-printed, and unknown. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
Specification