Document image processing method and system having function of determining body text region reading order
First Claim
1. A computer-readable medium having stored thereon a plurality of sequences of instructions, said plurality of sequences of instructions including sequences of instructions which, when executed by a processor, cause said processor to:
- a) extract text regions from an input document image;
b) classify said text regions into;
(b1) in-order reading regions of text which are to be successively read in a predetermined order and(b2) different-attribute regions of text;
c) detect a construction of said in-order reading regions but not of said different-attribute regions; and
d) determine the reading order, in which said in-order reading regions are to be read, using said construction.
0 Assignments
0 Petitions
Accused Products
Abstract
An extracting step extracts text regions from an input document image. A classifying step classifies the text regions into in-order reading regions to be successively read in the predetermined order and different-attribute regions. A detecting step detects the construction of the in-order reading regions. A determining step determines the reading order, in which the in-order reading regions are to be read, using the construction. The detecting step detects the construction in a manner that is the same whether the input document image comprises a vertically typeset document or a horizontally typeset document. The detecting step further includes a tree graph formation step c-1) forming a tree graph representing the construction including nodes respectively representing the in-order reading regions.
73 Citations
33 Claims
-
1. A computer-readable medium having stored thereon a plurality of sequences of instructions, said plurality of sequences of instructions including sequences of instructions which, when executed by a processor, cause said processor to:
-
a) extract text regions from an input document image; b) classify said text regions into; (b1) in-order reading regions of text which are to be successively read in a predetermined order and (b2) different-attribute regions of text; c) detect a construction of said in-order reading regions but not of said different-attribute regions; and d) determine the reading order, in which said in-order reading regions are to be read, using said construction. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable medium having stored thereon a plurality of sequences of instructions, said plurality of sequences of instructions including sequences of instructions which, when executed by a processor, cause said processor to:
-
a) extract text regions from an input document image; b) classify said text regions into in-order reading regions which are to be successively read in a predetermined order and into different-attribute regions; c) detect a construction of said in-order reading regions; d) determine the reading order, in which said in-order reading regions are to be read, using said construction; e) check whether said reading order is correct or incorrect; and f) re-determine the reading order if a result of incorrect is obtained; wherein said check causes said processor to; e1) provide reference points to the respective in-order reading regions; e2) connect said reference points in accordance with a relevant reading order; and e3) determine said reading order to be incorrect if lines formed as the result of the connection intersect.
-
-
16. A computer-readable medium having stored thereon a plurality of sequences of instructions, said plurality of sequences of instructions including sequences of instructions which, when executed by a processor, cause said processor to:
-
a) extract text regions from an input document image; b) classify said text regions into in-order reading regions which are to be successively read in a predetermined order and into different-attribute regions; c) detect a construction of said in-order reading regions; d) determine the reading order, in which said in-order reading regions are to be read, using said construction; e) check whether said reading order is correct or incorrect; and f) re-determine the reading order if a result of incorrect is obtained; wherein said check causes said processor to; e1) provide reference points to the respective in-order reading regions; e2) connect said reference points in accordance with a relevant reading order; and e3) determine said reading order to be incorrect if a number of intersections of the lines formed as a result of the connection exceeds a predetermined value.
-
-
17. A computer-readable medium having stored thereon a plurality of sequences of instructions, said plurality of sequences of instructions including sequences of instructions which, when executed by a processor, cause said processor to:
-
a) extract text regions from an input document image; b) classify said text regions into; (b1) in-order reading regions of text which are to be read in a predetermined order and (b2) different-attribute regions of text; c) detect a construction of said in-order reading regions but not of said different-attribute regions; and d) determine the reading order, in which said in-order reading regions are to be read, using said construction. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A computer-readable medium having stored thereon a plurality of sequences of instructions, said plurality of sequences of instructions including sequences of instructions which, when executed by a processor, cause said processor to:
-
a) extract text regions from an input document image; b) classify said text regions into in-order reading regions which are to be read in a predetermined order and into different-attribute regions; c) detect a construction of said in-order reading regions; d) determine the reading order, in which said in-order reading regions are to be read, using said construction; e) check whether said reading order is correct or incorrect; and f) re-determine the reading order of said in-order reading regions using another predetermined procedure if a result of incorrect is obtained; wherein said check causes said processor to; e1) provide reference points to the respective in-order reading regions; e2) connect said reference points in accordance with a relevant reading order; and e3) determine said reading order to be incorrect if lines formed as the result of the connection intersect.
-
-
32. A computer-readable medium having stored thereon a plurality of sequences of instructions, said plurality of sequences of instructions including sequences of instructions which, when executed by a processor, cause said processor to:
-
a) extract text regions from an input document image; b) classify said text regions into in-order reading regions which are to be read in a predetermined order and into different-attribute regions; c) detect a construction of said in-order reading regions; d) determine the reading order, in which said in-order reading regions are to be read, using said construction; e) check whether said reading order is correct or incorrect; and f) re-determine the reading order of said in-order reading regions using another predetermined procedure if a result of incorrect is obtained; wherein said check causes said processor to; e1) provide reference points to the respective in-order reading regions; e2) connect said reference points in accordance with a relevant reading order; and e3) determine said reading order to be incorrect if intersections of lines formed as the result of the connection exceeds a predetermined threshold value.
-
-
33. A computer-readable medium having stored thereon a plurality of sequences of instructions, said plurality of sequences of instructions including sequences of instructions which, when executed by a processor, cause said processor to:
-
a) extract text regions from an input document image; b) classify said text regions into; (b1) body regions of text which are to be successively read in a predetermined order and (b2) different-attribute regions of text; c) detect a construction of said body regions but not of said different-attribute regions; and d) determine the reading order, in which said body regions are to be read, using said construction.
-
Specification