Text localization for image and video OCR
First Claim
Patent Images
1. A method of text detection in a video image, comprising:
- at an image processor, receiving a video frame that potentially contains text;
segmenting the video frame into regions having similar color;
where color similarity is defined as
D(c1,c2)=√
{square root over (( R1−
R2)2+( G1−
G2)2+( B1−
B2)2)}<
Tcolor,where c1=[ R1 G1 B1] and c2=[ R2 G2 B2] are average colors of two regions and Tcolor is a merging threshold;
identifying high likelihood non-text regions from the regions having similar color and discarding the high likelihood non-text regions;
merging regions having similar color and having horizontal positions that are within a threshold;
describing the regions using features by carrying out a feature extraction process to extract stroke features, edge features, and fill factor features; and
passing the remaining regions through a trained binary classifier to obtain final text regions which are binarized for processing by OCR software.
1 Assignment
0 Petitions
Accused Products
Abstract
In accord with embodiments consistent with the present invention, a first action in recognizing text from image and video is to locate accurately the position of the text in image and video. After that, the located and possibly low resolution text can be extracted, enhanced and binarized. Finally existing OCR technology can be applied to the binarized text for recognition. This abstract is not to be considered limiting, since other embodiments may deviate from the features described in this abstract.
-
Citations
18 Claims
-
1. A method of text detection in a video image, comprising:
-
at an image processor, receiving a video frame that potentially contains text; segmenting the video frame into regions having similar color; where color similarity is defined as
D(c1,c2)=√
{square root over ((R1 −
R2 )2+(G1 −
G2 )2+(B1 −
B2 )2)}<
Tcolor,where c1=[ R1 G1 B1 ] and c2=[R2 G2 B2 ] are average colors of two regions and Tcolor is a merging threshold;identifying high likelihood non-text regions from the regions having similar color and discarding the high likelihood non-text regions; merging regions having similar color and having horizontal positions that are within a threshold; describing the regions using features by carrying out a feature extraction process to extract stroke features, edge features, and fill factor features; and passing the remaining regions through a trained binary classifier to obtain final text regions which are binarized for processing by OCR software. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
13. The method according to claim 1, where the binarization is carried out using a plurality of binarization methods with each binarized output being processed by an optical character reader to produce multiple outputs that are combined.
-
14. A text detection process, comprising:
-
preprocessing an image by segmentation using statistical region merging, removing regions that are definitely not text and grouping regions based on criteria of height similarity, color similarity, region distance and horizontal alignment defined as follows; height similarity is defined as - View Dependent Claims (15, 16, 17)
-
-
18. A method of text detection in a video image, comprising:
-
at an image processor, receiving a video frame that potentially contains text; segmenting the video frame into regions having similar color; identifying high likelihood non-text regions from the regions having similar color and discarding the high likelihood non-text regions; merging regions having similar color and having horizontal positions that are within a threshold; describing the regions using features by carrying out a feature extraction process to extract stroke features, edge features, and fill factor features; representing the extracted features as feature vectors; passing the remaining regions through a trained binary classifier to obtain final text regions which are binarized for processing by OCR software; and where the trained binary classifier classifies each feature by use of a support vector machine (SVM) classifier engine which outputs whether the region is text or not using the following equation;
-
Specification