System and method for detection and segmentation of touching characters for OCR
First Claim
Patent Images
1. A method for detection of touching characters in a media by segmentation of adjoining character spaces, the method comprising:
- acquiring each component of the media in a predetermined sequence, each component having at least two touching characters;
determining an aspect ratio of each component; and
performing a component investigation for each aspect ratio higher than a threshold aspect ratio, the component investigation comprising;
determining a candidate touching position of the at least two touching characters in a plurality of geometric orientations of the at least two touching characters;
computing a number of pixels representing a text of the at least two touching characters;
computing a length of a longest run of the number of pixels representing the text of the at least two touching characters for each column of the component;
determining a candidate cut column based on a relation between a column pixel density and a corresponding length of the column; and
segmenting the at least two touching characters with a referential boundary of the candidate cut column in the component.
1 Assignment
0 Petitions
Accused Products
Abstract
The present disclosure relates to a system and a method for detection of touching characters in a media, characterized by segmentation of adjoining character spaces. In the very first step, an aspect ratio is calculated for each connected component. A candidate touching position of each character is determined by calculating a threshold aspect ratio for each character. Further, a candidate cut column is determined based on a relation between column pixel densities and corresponding length thereof the column in order to segment the touching characters at the candidate cut column.
18 Citations
16 Claims
-
1. A method for detection of touching characters in a media by segmentation of adjoining character spaces, the method comprising:
-
acquiring each component of the media in a predetermined sequence, each component having at least two touching characters; determining an aspect ratio of each component; and performing a component investigation for each aspect ratio higher than a threshold aspect ratio, the component investigation comprising; determining a candidate touching position of the at least two touching characters in a plurality of geometric orientations of the at least two touching characters; computing a number of pixels representing a text of the at least two touching characters; computing a length of a longest run of the number of pixels representing the text of the at least two touching characters for each column of the component; determining a candidate cut column based on a relation between a column pixel density and a corresponding length of the column; and segmenting the at least two touching characters with a referential boundary of the candidate cut column in the component. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for detection of touching characters in a media by segmentation of adjoining character spaces, the system comprising:
-
an input device configured to acquire each component of the media in a predetermined sequence, each component having at least two touching characters; and a processor configured to determine an aspect ratio of each component, wherein the processor performs a component investigation for each aspect ratio higher than a threshold aspect ratio, wherein the processor; determines a candidate touching position of the at least two touching characters in a plurality of geometric orientations of the characters; computes a number of pixels representing a text of the at least two touching characters; computes a length of a longest run of the number of pixels representing the text of the at least two touching characters for each column of the component; determines a candidate cut column based on a column pixel density and a corresponding length of the column; and segments the at least two touching characters with a referential boundary of the candidate cut column in the component. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A computerized system for detection of touching characters in media by segmentation of adjoining character spaces, the system comprising:
-
at least one media article having a plurality of components, wherein at least a portion of the plurality of components has at least two touching characters; and a computerized device receiving the at least one media article through at least one input device, wherein at least a portion of the plurality of components is acquired through the input device in a predetermined sequence, wherein the computerized device has a non-transient memory and a processor, the processor capable of executing a plurality of code stored on the non-transient memory, wherein the plurality of code further comprises; code for determining an aspect ratio of each component; code for performing a component investigation for each aspect ratio higher than a threshold aspect ratio; code for determining a candidate touching position of the at least two touching characters in a plurality of geometric orientations of the characters; code for computing a number of pixels representing a text of the at least two touching characters; code for computing a length of a longest run of the number of pixels representing the text of the at least two touching characters for each column of the component; code for determining a candidate cut column based on a column pixel density and a corresponding length of the column; and code for segmenting the at least two touching characters with a referential boundary of the candidate cut column in the component. - View Dependent Claims (13, 14, 15, 16)
-
Specification