Zone segmentation for image display
First Claim
1. A method for classifying segments of a digital image into text-like portions and non-text-like portions, said method comprising the steps of:
- (a) establishing a set of fuzzy detection rules for distinguishing text-like portions of said image from said non-text-like portions of said image;
said establishing step including the steps of;
(aa) identifying a plurality of image features that distinguish different portions of an image;
(ab) generating a plurality of fuzzy detection rules by applying different combinations of said features to a text-like learning image and to a non-text-like learning image;
(b) dividing the image into a plurality of segments; and
(c) applying said set of fuzzy detection rules to each segment of said image to thereby classify each said segment as being one of a text-like portion and a non-text-like portion.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides a method and apparatus for detecting in an image text-like portions and non-text-like portions. The method includes the steps of establishing a set of fuzzy detection rules for distinguishing text-like portions of an image from non-text-like portions of an image, dividing the test image into a plurality of segment, and applying the set of fuzzy detection rules to each segment of the test image to thereby classify each segment as text-like or non-text-like. Preferably, the establishing step includes the sub-steps of identifying a plurality of image features that distinguish different portions of an image, generating a plurality of fuzzy detection rules by applying different combinations of the features to a text-like learning image and to a non-text-like learning image, and minimizing the rules to exclude those rules not supported by a predetermined amount of the learning images, and allocating the non-excluded rules to the set. Optionally, the generating sub-step includes the sub-steps of normalising each image feature to have a value in the range 0 to 1, partitioning each input feature space into a plurality of equally spaced region, assigning each input feature to a label of one of the regions to maximize a membership value of the label in the one region, selecting for each the region the maximized label for each feature to thus form a respective fuzzy rule.
-
Citations
66 Claims
-
1. A method for classifying segments of a digital image into text-like portions and non-text-like portions, said method comprising the steps of:
-
(a) establishing a set of fuzzy detection rules for distinguishing text-like portions of said image from said non-text-like portions of said image;
said establishing step including the steps of;
(aa) identifying a plurality of image features that distinguish different portions of an image;
(ab) generating a plurality of fuzzy detection rules by applying different combinations of said features to a text-like learning image and to a non-text-like learning image;
(b) dividing the image into a plurality of segments; and
(c) applying said set of fuzzy detection rules to each segment of said image to thereby classify each said segment as being one of a text-like portion and a non-text-like portion. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
(ac) minimizing said plurality of fuzzy detection rules to exclude rules that are not supported by a predetermined amount of said learning images, and allocating the non-excluded rules to said set.
-
-
3. The method as recited in claim 1, wherein said generating step (ab) comprises the sub-steps of:
-
(aba) normalising each image feature to have a value in the range 0 to 1;
(abb) partitioning each input feature space into a plurality of equally spaced regions;
(abc) assigning each input feature to a label of one of said regions to maximize a membership value of said label in said one region;
(abd) selecting for each said region the maximized label for each said feature to thus form a respective fuzzy rule.
-
-
4. The method as recited in claim 3, wherein adjacent ones of said equally spaced regions overlap.
-
5. The method as recited in claim 3, wherein each said fuzzy rule comprises a logical ANDed combination of said image features.
-
6. The method as recited in claim 3, wherein step (abd) comprises determining an output value Op for a pth input pattern:
-
where K is the number of rules, Oi is the class generated by rule i, and Dip measures how the pth pattern fits an IF condition of the ith rule, wherein Dip is given by the product of membership values of the feature vector for the labels used in the ith rule, such that, where n is the number of features, and mji is the membership value of feature j for the labels that the ith rule uses.
-
-
7. The method as recited in claim 3, wherein said regions correspond to said segments of said test image.
-
8. The method as recited in claim 1, wherein said image features comprise spatial domain features.
-
9. A method as recited in claim 8, wherein said image features are selected from the group consisting of:
-
(i) mean gray level in a region;
(ii) gray-level variance (or standard deviation) in a region;
(iii) absolute value of the gradient;
(iv) mean absolute value of the on-zero gradient in a region;
(v) maximum absolute value of the non-zero gradient in a region;
(vi) standard deviation of the absolute value of the on-zero gradient in a region;
(vii) absolute value of local contrast;
(viii) mean of the absolute value of non-zero local contrast;
(ix) maximum absolute value of the non-zero local contrast in a region;
(x) standard deviation of the absolute value of the non-zero contrast in a region;
(xi) contrast of a darker pixel against its background;
(xii) dominant local orientation;
(xiii) number of gray levels within in a region;
(xiv) number of pixels in the block with maximum gray level in a region;
(xv) number of pixels in the block with gray level larger than mean gray level in a region;
(xvi) number of pixels in block with gray level smaller than mean gray level in a region;
(xvii) directional gradients;
(xviii) transform domain features; and
(xix) x,y direction projections.
-
-
10. The method as recited in claim 1, wherein said image features are dependent upon frequency characteristic information of a portion of said image contained in each segment.
-
11. The method as recited in claim 10, wherein said image features comprise energy features obtained by decomposing said each segment.
-
12. The method as recited in claim 11, wherein decomposing said each segment is carried out by applying a wavelet transportation at least once to said each segment.
-
13. The method as recited in claim 1, wherein said segments form a regular array over said image and adjacent ones of segments overlap.
-
14. The method as recited in claim 1, wherein said segments comprise blocks and are sized in the range of 4×
- 4 pixels to 32×
32 pixels, and are preferably 9×
9 pixels.
- 4 pixels to 32×
-
15. An apparatus for classifying segments of a digital image into text-like portions and non-text-like portions, said apparatus comprising:
-
(a) means for establishing a set of fuzzy detection rules for distinguishing text-like portions of an image from non-text-like portions of an image;
said establishing means comprising;
means for identifying a plurality of image features that distinguish different portions of an image;
means for generating a plurality of fuzzy detection rules by applying different combinations of said features to a text-like learning image and to a non-text-like learning image;
(b) means for dividing a test image into a plurality of segments; and
(c) means for applying said set of fuzzy detection rules to each segment of said test image to thereby classify each said segment as being one of a text-like portion and a non-text-like portion. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
means for minimizing said rules to exclude those rules not supported by a predetermined amount of said learning images, and allocating the non-excluded rules to said set.
-
-
17. The apparatus as recited in claim 15, wherein said generating means further comprises:
-
means for normalising each image feature to have a value in the range 0 to 1;
means for partitioning each input feature space into a plurality of equally spaced regions;
means for assigning each input feature to a label of one of said regions to maximize a membership value of said label in said one region;
means for selecting for each said region the maximized label for each said feature to thus form a respective fuzzy rule.
-
-
18. The apparatus as recited in claim 17, wherein adjacent ones of said equally spaced regions overlap.
-
19. The apparatus as recited in claim 17, wherein each said fuzzy rule comprises a logical ANDed combination of said image features.
-
20. The apparatus as recited in claim 17, wherein said selecting means comprises means for determining an output value Op for a pth input pattern:
-
where K is the number of rules, Oi is the class generated by rule i, and Dip measures how the pth pattern fits an IF condition of the ith rule, wherein Dip is given by the product of membership values of the feature vector for the labels used in the ith rule, such that, where n is the number of features, and mji is the membership value of feature j for the labels that the ith rule uses.
-
-
21. The apparatus is recited in claim 15, wherein said image features comprise spatial domain features.
-
22. The apparatus as recited in claim 21, wherein said image features are selected from the group consisting of:
-
(i) mean gray level in a region;
(ii) gray-level variance (or standard deviation) in a region;
(iii) absolute value of the gradient;
(iv) mean absolute value of the on-zero gradient in a region;
(v) maximum absolute value of the non-zero gradient in a region;
(vi) standard deviation of the absolute value of the on-zero gradient in a region;
(vii) absolute value of local contrast;
(viii) mean of the absolute value of non-zero local contrast;
(ix) maximum absolute value of the non-zero local contrast in a region;
(x) standard deviation of the absolute value of the non-zero contrast in a region;
(xi) contrast of a darker pixel against its background;
(xii) dominant local orientation;
(xiii) number of gray levels within in a region;
(xiv) number of pixels in the block with maximum gray level in a region;
(xv) number of pixels in the block with gray level larger than mean gray level in a region;
(xvi) number of pixels in block with gray level smaller than mean gray level in a region;
(xvii) directional gradients;
(xviii) transform domain features; and
(xix) x,y direction projections.
-
-
23. The apparatus as recited in claim 15, wherein said image features are dependent on frequency characteristic information of a portion of said image contained in each segment.
-
24. The apparatus as recited in claim 23, wherein said image features comprise energy features obtained by decomposing said each segment.
-
25. The apparatus as recited in claim 24, wherein decomposing said each segment is carried out by applying a wavelet transformation at least once to said each segment.
-
26. A method for classifying segments of a digital image for display on display means, wherein said digital image is processed as a plurality of blocks each having a predetermined number of pixels, said method comprising the steps of:
-
extracting a set of features from each block to generate a feature vector for said block; and
classifying said block using a set of fuzzy rules as either a text-type image or a natural-type image dependent on said feature vector for said block, said rules being generated by applying different combinations of said features to a text-like learning image and to a non-text-like learning image. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34)
selecting N features of M possible features, where N and M are integers with N≦
M.
-
-
29. The method according to claim 28, further comprising, to generate said fuzzy rules using training image data, the steps of:
-
extracting said N features from each block of said training image data;
assigning a respective label to each of said N features dependent upon the value of said each of said N features;
determining Q fuzzy rules dependent on labels of said N possible features, wherein each of said Q fuzzy rules has a corresponding amount of support based on said blocks of said training image data;
selecting P fuzzy rules of said Q possible fuzzy rules as said set of fuzzy rules, where P and Q being integers with P≦
M, dependent upon the corresponding amount of support of each of said P fuzzy rules exceeding a predetermined threshold value.
-
-
30. The method according to claim 26, wherein said set of features comprise energy measure features extracted from coefficients in a region of interest for each block.
-
31. The method according to claim 30, wherein said coefficients are obtained by wavelet transforming each block at least once.
-
32. The method according to claim 31, further comprising the step of tile integrating classified blocks so as to reduce the number of misclassified blocks.
-
33. The method according to claim 30, wherein said energy measure features comprise the variance of said coefficients over said region of interest for each block.
-
34. The method according to claim 33, wherein energy measure features are derived based on two or more scales of resolution of said coefficients in said region of interest.
-
35. An apparatus for classifying segments of a digital image for display on display means, wherein said digital image is processed as a plurality of blocks each having a predetermined number of pixels, said apparatus comprising the steps of:
-
means for extracting a set of features from each block to generate a feature vector for said block;
means for classifying said block using a set of fuzzy rules as either a text-type image or a natural-type image dependent on said feature vector for said block, said rules being generated by applying different combinations of said features to a text-like learning image and to a non-text-like learning image. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42, 43)
means for extracting said N features from each block of said training image data;
means for assigning a respective label to each of said N features dependent upon the value of said each of said N features;
means for determining Q fuzzy rules dependent on labels of said N possible features, wherein each of said Q fuzzy rules has a corresponding amount of support based on said blocks of said training image data;
means for selecting P fuzzy rules of said Q possible fuzzy rules as said set of fuzzy rules, where P and Q being integers with P≦
M, dependent upon the corresponding amount of support of each of said P fuzzy rules exceeding a predetermined threshold value.
-
-
39. The apparatus according to claim 35, wherein said set of features comprise energy measure features extracted from coefficients in a region of interest for each block.
-
40. The apparatus according to claim 39, wherein said coefficients are obtained by wavelet transforming each block at least once.
-
41. The apparatus according to claim 40, further comprising means for tile integrating classified blocks so as to reduce the number of misclassified blocks.
-
42. The apparatus according to claim 39, wherein said energy measure features comprise the variance of said coefficients over said region of interest for each block.
-
43. The apparatus according to claim 42, wherein said energy measure features are derived based on two or more scales of resolution of said coefficients in said region of interest.
-
44. A computer program product including a computer readable medium having recorded thereon a computer program for detecting in an image text-like portions and non-text-like portions, the computer program comprising:
-
(a) establishment steps for establishing a set of fuzzy detection rules for distinguishing text-like portions of said image from said non-text-like portions of said image;
said establishing steps comprising;
(aa) identifying a plurality of image features that distinguish different portions of an image;
(ab) generating a plurality of fuzzy detection rules by applying different combinations of said features to a text-like learning image and to a non-text-like learning image;
(b) dividing steps for dividing the image into a plurality of segments; and
(c) application steps for applying said set of fuzzy detection rules to each segment of said image to thereby classify each said segment as being one of a text-like portion and a non-text-like portion. - View Dependent Claims (45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57)
(ac) minimizing said plurality of fuzzy detection rules to exclude rules that are not supported by a predetermined amount of said learning images, and allocating the non-excluded rules to said set.
-
-
46. The computer program product as recited in claim 45, wherein said image feature comprise spatial domain features.
-
47. A computer program product as recited in claim 45, wherein said image features are selected from the group consisting of:
-
(i) mean gray level in a region;
(ii) gray-level variance (or standard deviation) in a region;
(iii) absolute value of the gradient;
(iv) mean absolute value of the on-zero gradient in a region;
(v) maximum absolute value of the non-zero gradient in a region;
(vi) standard deviation of the absolute value of the on-zero gradient in a region;
(vii) absolute value of local contrast;
(viii) means of the absolute value of non-zero local contrast;
(ix) maximum absolute value of the non-zero local contrast in a region;
(x) standard deviation of the absolute value of the non-zero contrast in a region;
(xi) contrast of a darker pixel against its background;
(xii) dominant local orientation;
(xiii) number of gray levels within a region;
(xiv) number of pixels in the block with maximum gray level in a region;
(xv) number of pixels in the block with gray level larger than mean gray level in a region;
(xvi) number of pixels in block with gray level small than mean gray level in a region;
(xvii) directional gradients;
(xviii) transform domain features; and
(xix) x, y direction protections.
-
-
48. The computer program product as recited in claim 44, wherein said generating step (ab) comprises the sub-steps of:
-
(aba) normalizing each image feature to have a value in the range 0 to 1;
(abb) partitioning each input feature space into a plurality of equally spaced regions;
(abc) assigning each input feature to a label of one of said regions to maximize a membership value of said label in said one region; and
(abd) selecting for each said region the maximized label for each said feature to thus form a respective fuzzy rule.
-
-
49. The computer program product as recited in claim 48, wherein adjacent ones of said equally spaced regions overlap.
-
50. The computer program product as recited in claim 48, wherein each said fuzzy rule comprises a logical ANDed combination of said image features.
-
51. The computer program product as recited in claim 48, wherein step (abd) comprises determining an output value Op for a pth input pattern:
-
where K is the number of rules, Oi is the class generated by rule i, and Dip measures how the pth pattern fits an IF condition of the ith rule, wherein Dip is given by the product of membership values of the feature vector for the labels used in the ith rule, such that, where n is the number of features, and mji is the membership value of feature j for the labels that the ith rule uses.
-
-
52. The computer product as recited in claim 48, wherein said regions correspond to said segments of said test image.
-
53. The computer program product as recited in claim 44, wherein said image features are dependent upon frequency characteristic information of a portion of said image contained in each segment.
-
54. The computer program product as recited in claim 53, wherein said image features comprise energy features obtained by decomposing said each segment.
-
55. The computer program product as recited in claim 54, wherein decomposing said each segment is carried out by applying a wavelet transportation at least once to said each segment.
-
56. The computer program product as recited in claim 48, wherein said segments form a regular array over said image and adjacent ones of segments overlap.
-
57. The computer program product as recited in claim 48, wherein said segments comprise blocks and are sized in the range of 4×
- 4 pixels to 32×
32 pixels, and preferably 9×
9 pixels.
- 4 pixels to 32×
-
58. A computer program product including a computer readable medium having recorded thereon a computer program for zone segmenting a digital image for display on display means, wherein said digital image is processed as a plurality of blocks each having a predetermined number of pixels, said computer program comprising:
-
extracting steps for extracting a set of features from each block to generate a feature vector of said block; and
classifying steps for classifying said block using a set of fuzzy rules as either a text-type image or a natural-type image dependent on said feature vector for said block, said rules being generated by applying different combinations of said features to a text-like learning image and to a non-text-like learning image. - View Dependent Claims (59, 60, 61, 62, 63, 64, 65, 66)
selecting N features of M possible features, where N and M are integers with N≦
M.
-
-
61. The computer program product according to claim 60, further comprising, to generate said fuzzy rules using training image data;
-
extracting steps for extracting said N features from each block of said training image data;
assigning steps for assigning a respective label to each of said N features dependent upon the value of each of said N features;
determining steps for determining Q fuzzy rules dependent on labels of said N possible features, wherein each of said Q fuzzy rules has a corresponding amount of support based on said blocks of said training image data; and
selecting steps for selecting P fuzzy rules of said Q possible fuzzy rules as said set of fuzzy rules, where P and Q are integers with P≦
M, dependent upon the corresponding amount of support of each of said fuzzy rules exceeding a predetermined threshold value.
-
-
62. The computer program product according to claim 58, wherein said set of features comprise energy measure features extracted from coefficients in a region of interest for each block.
-
63. The computer program product according to claim 62, wherein said coefficients are obtained by wavelet transforming each block at least once.
-
64. The computer program product according to claim 63, further comprising the step of tile integrating classified blocks so as to reduce the number of misclassified blocks.
-
65. The computer program product according to claim 62, wherein said energy measure features comprise the variance of said coefficients over said region of interest for each block.
-
66. The computer program product according to claim 65, wherein energy features are derived based on two or more scales of resolution of said coefficients in said region of interest.
Specification