Apparatus and method of analyzing layout of document, and computer product
First Claim
1. A program for analyzing a layout of a document for allowing a computer to function as:
- a black pixel linkage component extracting unit that extracts continuous black pixels as black pixel linkage components based on data for an image of the document;
a character element extracting unit that extracts character elements from the black pixel linkage components; and
a line element extracting unit that extracts a plurality of character elements as a line element, among character elements aligned in line orientation, each amount of displacement of the extracted character elements in orientation perpendicular to the line orientation being smaller than or equal to a threshold value.
1 Assignment
0 Petitions
Accused Products
Abstract
In an apparatus for analyzing a layout of a document, a character candidate element generator generates character candidate elements from black pixel linkage components of a document image. A horizontally oriented line rectangle generator sets a plurality of character candidate elements as a line candidate rectangle, among character candidate elements aligned in horizontal line orientation, when each amount of displacement of the set character candidate elements in a vertical orientation with respect to the horizontal line orientation, is smaller than or equal to a threshold value. A horizontally oriented paragraph-box generator sets a plurality of line candidate elements having approximately the same length as each other in the vertical orientation, as a paragraph candidate element.
13 Citations
33 Claims
-
1. A program for analyzing a layout of a document for allowing a computer to function as:
-
a black pixel linkage component extracting unit that extracts continuous black pixels as black pixel linkage components based on data for an image of the document;
a character element extracting unit that extracts character elements from the black pixel linkage components; and
a line element extracting unit that extracts a plurality of character elements as a line element, among character elements aligned in line orientation, each amount of displacement of the extracted character elements in orientation perpendicular to the line orientation being smaller than or equal to a threshold value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An apparatus for analyzing a layout of a document comprising:
-
a black pixel linkage component extracting unit that extracts continuous black pixels as black pixel linkage components based on data for an image of the document;
a character element extracting unit that extracts character elements from the black pixel linkage components; and
a line element extracting unit that extracts a plurality of character elements as a line element, among character elements aligned in line orientation, each amount of displacement of the extracted character elements in orientation perpendicular to the line orientation being smaller than or equal to a threshold value. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method of analyzing a layout of a document comprising steps of:
-
extracting continuous black pixels as black pixel linkage components based on data for an image of the document;
extracting character elements from the black pixel linkage components; and
extracting a plurality of character elements as a line element, among character elements aligned in line orientation, each amount of displacement of the extracted character elements in orientation perpendicular to the line orientation being smaller than or equal to a threshold value. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
Specification