Text localization for image and video OCR
First Claim
Patent Images
1. A method of text detection in a video image, comprising:
- at an image processor, receiving a video frame that potentially contains text;
segmenting the image into regions having similar color;
identifying high likelihood non-text regions from the regions having similar color and discarding the high likelihood non-text regions;
merginging those regions whose the size and color are similar and their horizontal positions are within a threshold in the remaining regions;
describing the regions using features by carrying out a feature extraction process to extract stroke features, edge features, and fill factor features; and
passing the remaining regions through a trained binary classifier to obtain the final text regions which can be binarized and recognized by OCR software.
1 Assignment
0 Petitions
Accused Products
Abstract
In accord with embodiments consistent with the present invention, a first action in recognizing text from image and video is to locate accurately the position of the text in image and video. After that, the located and possibly low resolution text can be extracted, enhanced and binarized. Finally existing OCR technology can be applied to the binarized text for recognition. This abstract is not to be considered limiting, since other embodiments may deviate from the features described in this abstract.
55 Citations
18 Claims
-
1. A method of text detection in a video image, comprising:
-
at an image processor, receiving a video frame that potentially contains text; segmenting the image into regions having similar color; identifying high likelihood non-text regions from the regions having similar color and discarding the high likelihood non-text regions; merginging those regions whose the size and color are similar and their horizontal positions are within a threshold in the remaining regions; describing the regions using features by carrying out a feature extraction process to extract stroke features, edge features, and fill factor features; and passing the remaining regions through a trained binary classifier to obtain the final text regions which can be binarized and recognized by OCR software. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A text detection process, comprising:
-
preprocessing an image by segmentation using statistical region merging, removing regions that are definitely not text and grouping regions based on the criteria of height similarity, color similarity, region distance and horizontal alignment defined as follows; height similarity is defined as where HEIGHT1 and HEIGHT2 are the height ofthe two regions; color similarity is defined as
D(c1,c2)=√
{square root over ((R1 −
R2 )2+(G1 ×
−
G2 )2+(B1 −
B2 )2)}<
Tcolor,where [ R1 G1 B1 ] and c2=[R2 G2 B2 ] are the average color of the two regions;region distance is defined as Dregion<
Tregion,where Dregion is the horizontal distance of the two regions, and horizontal alignment is defined as Dtop<
Talign or Dbottom<
Talign, where Dtop and Dbottom are the vertical distances between the top boundary and bottom boundary;carrying out a feature extraction process to describe each remaining region, where each feature is represented by a stroke feature, an edge feature and a fill factor feature of the region; and classifying the feature vector by use of a support vector machine (SVM) classifier engine which outputs whether the region is text or not using the following equation; where (xi, yi)are the feature vectors and groundtruth labels of training samples, x is the feature vector of the regions to be classified, ai and b are the parameters obtained by solving the optimization problem defined as and subject to yTa=0 (0≦
ai≦
C, i=1, . . . ,1), and K is defined asto obtain a classification output where 1 indicates the presence of text and −
1 indicates the absence of text.- View Dependent Claims (16, 17, 18)
is lower than a threshold, then a region is considered to be a high likelihood non-text region and can be excluded from being further processed.
-
-
18. The method according to claim 15, wherein the binarization is carried out using a plurality of binarization methods with each binarized output being processing by an optical character reader to produce multiple outputs that are combined.
Specification