Method and apparatus for text detection
First Claim
Patent Images
1. A method having scanned intensity information as input for detecting text in a scanned page by observing a very strong contrast in a localized region that comprises a substantially sharp edge between a dark side and a light side, the method comprising:
- pre-processing by detecting a local ramp, identifying an intensity trough, and determining a stroke width, wherein local ramp detection comprises using a threshold value and nine 3×
3 high-pass filters to obtain nine filtered values per pixel, and determining therefrom any of a vertical ramp detection value, a horizontal ramp detection value, a diagonal ramp in vertical direction detection value, and a diagonal ramp in horizontal direction detection value; and
contrast-based text detection processing, whereby any of black text on white background, black text on color background, and white or light text on a dark background are detected.
9 Assignments
0 Petitions
Accused Products
Abstract
A text detection technique comprises local ramp detection, identification of intensity troughs (candidate text strokes), determination of stroke width, preliminary detection of text based on contrast and stroke width, and a consistency check.
-
Citations
19 Claims
-
1. A method having scanned intensity information as input for detecting text in a scanned page by observing a very strong contrast in a localized region that comprises a substantially sharp edge between a dark side and a light side, the method comprising:
-
pre-processing by detecting a local ramp, identifying an intensity trough, and determining a stroke width, wherein local ramp detection comprises using a threshold value and nine 3×
3 high-pass filters to obtain nine filtered values per pixel, and determining therefrom any of a vertical ramp detection value, a horizontal ramp detection value, a diagonal ramp in vertical direction detection value, and a diagonal ramp in horizontal direction detection value; and
contrast-based text detection processing, whereby any of black text on white background, black text on color background, and white or light text on a dark background are detected. - View Dependent Claims (2, 3, 4, 5, 6)
v1((−
1,1,0),(−
1,1,0),(−
1,1,0));
v2((−
1,1,0),(0,−
1,1),(−
1,1,0));
v3((0,−
1,1),(−
1,1,0),(0,−
1,1));
h1((−
1,−
1,−
1),(1,1,1),(0,0,0));
h2((−
1,0,−
1),(1,−
1,1),(0,1,0));
h3((0,−
1,0),(−
1,1,−
1),(1,0,1));
dv1((1,0,0),(−
1,1,0),(−
1,1,1));
dv2((−
1,−
1,1),(−
1,1,0),(1,0,0)); and
dh1((1,−
1,−
1),(0,1,−
1),(0,0,1));
wherein dh2 =dv2.
-
-
3. The method of claim 1, wherein determining vertical ramp detection further comprises:
-
comparing three associated filtered values for vertical ramp detection, said three associated values from said nine filtered values, to said threshold value; and
selecting one of said three associated filtered values, wherein said selected filtered value is larger than said threshold value.
-
-
4. The method of claim 1, wherein determining horizontal ramp detection further comprises:
-
comparing three associated filtered values for horizontal ramp detection, said three associated values from said 9 filtered values, to said threshold value; and
selecting one of said three associated filtered values, wherein said selected filtered value is larger than said threshold value.
-
-
5. The method of claim 1, wherein determining diagonal ramp detection in vertical direction further comprises:
-
comparing two associated filtered values for diagonal ramp detection in vertical direction, said two associated values from said nine filtered values, to said threshold value; and
selecting one of said two associated filtered values, wherein said selected filtered value is larger than said threshold value.
-
-
6. The method of claim 1, wherein determining diagonal ramp detection in horizontal direction further comprises:
-
comparing two associated filtered values for diagonal ramp detection in horizontal direction, said two associated values from said nine filtered values, to said threshold value; and
selecting one of said two associated filtered values, wherein said selected filtered value is larger than said threshold value.
-
-
7. A method having scanned intensity information as input for detecting text in a scanned page by observing a very strong contrast in a localized region that comprises a substantially sharp edge between a dark side and a light side, the method comprising:
-
pre-processing by detecting a local ramp, identifying an intensity trough, and determining a stroke width;
contrast-based text detection processing, whereby any of black text on white background, black text on color background, and white or light text on a dark background are detected;
wherein identifying an intensity trough uses a finite state machine algorithm, said algorithm having a sweeping procedure, said sweeping procedure further comprising sweeping said scanned page from left to right for detecting vertical troughs, and sweeping said scanned page from top to bottom for detecting horizontal troughs, said finite state machine having a set of five states, and said left to right sweeping procedure further comprising;
for each row in said scanned page having a plurality of pixels, and wherein said sweeping procedure sweeps a current pixel of said plurality of pixels one pixel at a time;
starting at a first state of said five states at a leftmost pixel as said current pixel in said row of said plurality of pixels;
using a signed ramp strength as input, wherein said signed ramp strength is a vertical ramp strength or a diagonal ramp strength in a vertical direction;
processing said input by following a set of predetermined rules using said set of five states; and
assigning a new state from said set of five states to said current pixel. - View Dependent Claims (8, 9, 10, 11, 12, 13)
default, for indicating non-text;
going downhill, for indicating negative ramping in intensity;
bottom of trough, for indicating a body of text stroke;
going uphill, for indicating positive ramping in intensity; and
end of uphill, for indicating a reset.
-
-
9. The method of claim 8, wherein said determined cumulative ramp strength is in units of one third of said ramp threshold value, and said determined cumulative duration is in units of pixels of stay in a particular state.
-
10. The method of claim 7, further comprising:
-
determining a cumulative ramp strength;
determining a cumulative duration of stay in a particular state; and
determining a total duration of stay in a third state before switching to a fourth state or fifth state, said third, fourth, and fifth states from said set of five states;
wherein each determining step uses any of;
an edge threshold value, wherein said threshold value is a cumulative ramp strength value required for identifying a high-contrast edge;
a maximum ramp strength value, wherein said value is a maximum ramping value allowed; and
a maximum width value, wherein said value is an upper limit to a number of detected stroke widths.
-
-
11. The method of claim 10, further comprising providing as input to said stroke width determination step, said assigned state of each of said current pixels, said associated cumulative duration of stay in said particular state, and, when said current pixel is in said fifth state, said total duration of stay in said third state before switching to said fourth state or fifth state.
-
12. The method of claim 7, wherein said input is either:
-
a minimum value of any of said signed ramp strengths for said current pixel, said pixel above said current pixel, and said pixel below said current pixel, when said current pixel is in said first, second or fifth state;
ora maximum value of any of said signed ramp strengths for said current pixel, said pixel above said current pixel, and said pixel below said current pixel, when said current pixel is in said third or fourth state.
-
-
13. The method of claim 7, further comprising adjustments for performing said top to bottom sweeping procedure.
-
14. A method having scanned intensity information as input for detecting text in a scanned page by observing a very strong contrast in a localized region that comprises a substantially sharp edge between a dark side and a light side, the method comprising:
-
pre-processing by detecting a local ramp, identifying an intensity trough and determining a stroke width by determining a width and a skeleton, wherein said width is a distance value and said skeleton is a skeletal line, and detecting closely touching text strokes, wherein said width and skeleton determining further comprises setting the width value to the smaller of a vertical distance and a horizontal distance between two edges of said stroke, and determining said skeletal line as a roughly equidistant line from said edges; and
contrast-based text detection processing, whereby any of black text on white background, black text on color background, and white or light text on a dark background are detected, wherein said setting width value step further comprises;
determining said vertical distance from an associated algorithm by using vertical trough information as input in an N×
1 window beginning at a current pixel; and
determining said horizontal distance and said skeletal line from an associated algorithm by using horizontal trough information as input in a 1×
N window beginning at a current pixel.- View Dependent Claims (15)
-
-
16. A method having scanned intensity information as input for detecting text in a scanned page by observing a very strong contrast in a localized region that comprises a substantially sharp edge between a dark side and a light side, the method comprising:
-
pre-processing by detecting a local ramp, identifying an intensity trough, and determining a stroke width by determining a width and a skeleton, wherein said width is a distance value and said skeleton is a skeletal line, and detecting closely touching text strokes wherein said detecting closely touching text strokes further comprises detecting a pattern of dark-light-dark (DLD) in a horizontal or a vertical direction within a very small window;
contrast-based text detection processing, whereby any of black text on white background, black text on color background, and white or light text on a dark background are detected, wherein said detecting a DLD pattern in said vertical direction further comprises;
providing an N×
M window over a set of pixels and centered at a current pixel;
for each column of said N×
M window;
dividing said set of pixels into three disjoint groups, wherein said groups are a top group, a middle group, and a bottom group, respectively; and
detecting a column DLD pattern in said column when a difference between a darkest pixel in said top group and a lightest pixel in said middle group, and a difference between a darkest pixel in said bottom group and said lightest pixel in said middle group are both bigger than a DLD threshold value;
counting a number of detected DLD columns within said N×
M window; and
turning on a DLD flag associated with said current pixel, when said counted number is bigger than a predetermined counting threshold. - View Dependent Claims (17, 18, 19)
N=7 and M=5;
said top group includes two pixels, said middle group includes three pixels, said bottom group includes two pixels; and
said predetermined threshold is equal to two.
-
-
18. The method of claim 16, further comprising adjustments for said detecting a DLD pattern in said horizontal direction, and using an M×
- N window.
-
19. The method of claim 16, further comprising:
-
turning said DLD flag, when either a horizontal or a vertical DLD pattern is detected;
wherein said flag is passable to another module to be used to ensure that enhanced text strokes will be cleanly separated from one another.
-
Specification