System and Method for Detection and Segmentation of Touching Characters for OCR
1 Assignment
0 Petitions
Accused Products
Abstract
The present disclosure relates to a system and a method for detection of touching characters in a media, characterized by segmentation of adjoining character spaces. In the very first step, an aspect ratio is calculated for each connected component. A candidate touching position of each character is determined by calculating a threshold aspect ratio for each character. Further, a candidate cut column is determined based on a relation between column pixel densities and corresponding length thereof the column in order to segment the touching characters at the candidate cut column.
20 Citations
25 Claims
-
1-9. -9. (canceled)
-
10. A method for detection of touching characters in a media by segmentation of adjoining character spaces, the method comprising:
-
acquiring each component of the media in a predetermined sequence, each component having at least two touching characters; determining an aspect ratio of each component; and performing a component investigation for each aspect ratio higher than a threshold aspect ratio, the component investigation comprising; determining a candidate touching position of the at least two touching characters in a plurality of geometric orientation of the at least two touching characters; computing a number of pixels representing a text of the at least two touching characters; computing a length of a longest run of the number of pixels representing the text of the at least two touching characters for each column of the component; determining a candidate cut column based on a relation between a column pixel density and a corresponding length of the column; and segmenting the at least two touching characters with a referential boundary of the candidate cut column in the component. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A system for detection of touching characters in a media by segmentation of adjoining character spaces, the system comprising:
-
an input device configured to acquire each component of the media in a predetermined sequence, each component having at least two touching characters; and a processor configured to determine an aspect ratio of each component, wherein the processor performs a component investigation for each aspect ratio higher than a threshold aspect ratio, wherein the processor; determines a candidate touching position of the at least two touching characters in a plurality of geometric orientations of the characters; computes a number of pixels representing a text of the at least two touching characters; computes a length of a longest run of the number of pixels representing the text of the at least two touching characters for each column of the component; determines a candidate cut column based on a column pixel density and a corresponding length of the column; and segments the at least two touching characters with a referential boundary of the candidate cut column in the component. - View Dependent Claims (17, 18, 19, 20)
-
-
21. A computerized system for detection of touching characters in media by segmentation of adjoining character spaces, the system comprising:
-
at least one media article having a plurality of components, wherein at least a portion of the plurality of components has at least two touching characters; and a computerized device receiving the at least one media article through at least one input device, wherein at least a portion of the plurality of components is acquired through the input device in a predetermined sequence, wherein the computerized device has a non-transient memory and a processor, the processor capable of executing a plurality of code stored on the non-transient memory, wherein the plurality of code further comprises; code for determining an aspect ratio of each component; code for performing a component investigation for each aspect ratio higher than a threshold aspect ratio; code for determining a candidate touching position of the at least two touching characters in a plurality of geometric orientations of the characters; code for computing a number of pixels representing a text of the at least two code for computing a length of a longest run of the number of pixels representing the text of the at least two touching characters for each column of the component; code for determining a candidate cut column based on a column pixel density and a corresponding length of the column; and code for segmenting the at least two touching characters with a referential boundary of the candidate cut column in the component. - View Dependent Claims (22, 23, 25)
-
Specification