Document image processing apparatus, document image processing method, and computer-readable recording medium having recorded document image processing program
First Claim
1. A document image processing apparatus comprising:
- a memory for storing a document image; and
a controller for controlling extraction of an index region from said document image, wherein said controller is configured toi) classify a plurality of character string element regions constituting said document image into small regions and large regions,ii) determine each small region positioned just before said large region according to a reading order as a first candidate, as a first determining process,iii) determine at least one part of said first candidates as a first index, by performing an evaluating process to evaluate whether or not each said first candidate is an index, based on a difference in feature from the related large region, with respect to each said first candidate, as a first evaluating process,iv) determine each small region positioned just before said first index according to the reading order as a second candidate, as a second determining process,v) determine at least one part of said second candidates as a second index, by performing an evaluating process to evaluate whether or not said second candidate is the index, based on a difference in feature from the related first index, with respect to each said second candidate, as a second evaluating process, andvi) extract the small regions determined as said first index and said second index, as said index region whereinin said first evaluating process, said controller sets a first feature section for each said first candidate as for a style type different in feature from a corresponding related large region that represents said related large region corresponding to the intended first candidate among a plurality of style types, said first feature section including a feature of said intended first candidate region but not including a feature of said corresponding related large region,groups into region groups at least one or both of the related large regions and the first candidates having the feature included in said set first feature section,calculates a first index evaluation degree, based on a number of members of each region group with respect to each said first candidate, anddetermines whether or not a logical element of each said first candidate is the index, based on said calculated first index evaluation degree, andin said second evaluating process, the controller sets a second feature section for each said second candidate as for a style type different in feature from a corresponding related first index that represents said related first index corresponding to the intended second candidate among said plurality of style types, said second feature section including a feature of said intended second candidate region but not including a feature of said corresponding related first index,groups into region groups at least one or both of the related first indexes and the second candidates having the feature included in said set second feature section,calculates a second index evaluation degree, based on a number of members of each region group with respect to each said second candidate, anddetermines whether or not a logical element of each said second candidate is the index, based on said calculated second index evaluation degree.
1 Assignment
0 Petitions
Accused Products
Abstract
Each small region positioned just before a large region according to a reading order is determined as a first candidate, and an evaluating process to evaluate whether each first candidate is an index or not is performed based on a difference in feature from the related large region with respect to each first candidate. Each small region positioned just before a first index according to the reading order is determined as a second candidate, and an evaluating process to evaluate whether each second candidate is the index or not is performed based on a difference in feature from the related first index with respect to each second candidate. Small regions determined as the first index and the second index are extracted as index regions.
19 Citations
12 Claims
-
1. A document image processing apparatus comprising:
-
a memory for storing a document image; and a controller for controlling extraction of an index region from said document image, wherein said controller is configured to i) classify a plurality of character string element regions constituting said document image into small regions and large regions, ii) determine each small region positioned just before said large region according to a reading order as a first candidate, as a first determining process, iii) determine at least one part of said first candidates as a first index, by performing an evaluating process to evaluate whether or not each said first candidate is an index, based on a difference in feature from the related large region, with respect to each said first candidate, as a first evaluating process, iv) determine each small region positioned just before said first index according to the reading order as a second candidate, as a second determining process, v) determine at least one part of said second candidates as a second index, by performing an evaluating process to evaluate whether or not said second candidate is the index, based on a difference in feature from the related first index, with respect to each said second candidate, as a second evaluating process, and vi) extract the small regions determined as said first index and said second index, as said index region wherein in said first evaluating process, said controller sets a first feature section for each said first candidate as for a style type different in feature from a corresponding related large region that represents said related large region corresponding to the intended first candidate among a plurality of style types, said first feature section including a feature of said intended first candidate region but not including a feature of said corresponding related large region, groups into region groups at least one or both of the related large regions and the first candidates having the feature included in said set first feature section, calculates a first index evaluation degree, based on a number of members of each region group with respect to each said first candidate, and determines whether or not a logical element of each said first candidate is the index, based on said calculated first index evaluation degree, and in said second evaluating process, the controller sets a second feature section for each said second candidate as for a style type different in feature from a corresponding related first index that represents said related first index corresponding to the intended second candidate among said plurality of style types, said second feature section including a feature of said intended second candidate region but not including a feature of said corresponding related first index, groups into region groups at least one or both of the related first indexes and the second candidates having the feature included in said set second feature section, calculates a second index evaluation degree, based on a number of members of each region group with respect to each said second candidate, and determines whether or not a logical element of each said second candidate is the index, based on said calculated second index evaluation degree. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A document image processing method executed by a document image processing apparatus comprising a memory storing a document image, to extract an index region from said document image stored in said memory, and comprising the steps of:
-
classifying a plurality of character string element regions constituting said document image into small regions and large regions; determining each small region positioned just before said large region according to a reading order as a first candidate; determining at least one part of said first candidates as a first index, by performing a first evaluating process to evaluate whether or not each said first candidate is an index, based on a difference in feature from the related large region, with respect to each said first candidate; determining each small region positioned just before said first index according to the reading order as a second candidate; determining at least one part of said second candidates as a second index, by performing a second evaluating process to evaluate whether or not said second candidate is the index, based on a difference in feature from the related first index, with respect to each said second candidate; and extracting the small regions determined as said first index and said second index, as said index region, wherein said first evaluating process comprises setting a first feature section for each said first candidate as for a style type different in feature from a corresponding related large region that represents said related large region corresponding to the intended first candidate among a plurality of style types, said first feature section including a feature of said intended first candidate region but not including a feature of said corresponding related large region, grouping into region groups at least one or both of the related large regions and the first candidates having the feature included in said set first feature section, calculating a first index evaluation degree, based on a number of members of each region group with respect to each said first candidate, and determining whether or not a logical element of each said first candidate is the index, based on said calculated first index evaluation degree, and said second evaluating process comprises setting a second feature section for each said second candidate as for a style type different in feature from a corresponding related first index that represents said related first index corresponding to the intended second candidate among said plurality of style types, said second feature section including a feature of said intended second candidate region but not including a feature of said corresponding related first index, grouping into region groups at least one or both of the related first indexes and the second candidates having the feature included in said set second feature section, calculating a second index evaluation degree, based on a number of members of each region group with respect to each said second candidate, and determining whether or not a logical element of each said second candidate is the index, based on said calculated second index evaluation degree.
-
-
12. A computer-readable non-transitory recording medium having a recorded document image processing program comprising the steps of:
-
classifying a plurality of character string element regions constituting a document image into small regions and large regions; determining each small region positioned just before said large region according to a reading order as a first candidate; determining at least one part of said first candidates as a first index, by performing a first evaluating process to evaluate whether or not each said first candidate is an index, based on a difference in feature from the related large region, with respect to each said first candidate; determining each small region positioned just before said first index according to the reading order as a second candidate; determining at least one part of said second candidates as a second index, by performing a second evaluating process to evaluate whether or not said second candidate is the index, based on a difference in feature from the related first index, with respect to each said second candidate; and extracting the small regions determined as said first index and said second index, as said index region wherein said first evaluating process comprises setting a first feature section for each said first candidate as for a style type different in feature from a corresponding related large region that represents said related large region corresponding to the intended first candidate among a plurality of style types, said first feature section including a feature of said intended first candidate region but not including a feature of said corresponding related large region, grouping into region groups at least one or both of the related large regions and the first candidates having the feature included in said set first feature section, calculating a first index evaluation degree, based on a number of members of each region group with respect to each said first candidate, and determining whether or not a logical element of each said first candidate is the index, based on said calculated first index evaluation degree, and said second evaluating process comprises setting a second feature section for each said second candidate as for a style type different in feature from a corresponding related first index that represents said related first index corresponding to the intended second candidate among said plurality of style types, said second feature section including a feature of said intended second candidate region but not including a feature of said corresponding related first index, grouping into region groups at least one or both of the related first indexes and the second candidates having the feature included in said set second feature section, calculating a second index evaluation degree, based on a number of members of each region group with respect to each said second candidate, and determining whether or not a logical element of each said second candidate is the index, based on said calculated second index evaluation degree.
-
Specification