Apparatus and method for extracting management information from image
First Claim
Patent Images
1. A management information extraction apparatus comprising:
- a computation device computing a position of management information contained in an arbitrary input image according to relative position information representing a difference between a position of a ruled line, which is included in a table area contained in the input image and encompasses the management information, and a position of an outline point in an outline portion of the table area; and
an extraction device extracting the management information from the input image based on the position computed by said computation device.
1 Assignment
0 Petitions
Accused Products
Abstract
A management information extraction apparatus learns the structure of ruled lines of a document and the position of user-specified management information such as a title, etc. during a form learning process, and stores them in a layout dictionary. During the operation, the structure of the ruled lines extracted from an image of an input document is matched with that of the document in the layout dictionary. Then, position information in the layout dictionary is referred to, and the management information is extracted from the input document.
-
Citations
22 Claims
-
1. A management information extraction apparatus comprising:
-
a computation device computing a position of management information contained in an arbitrary input image according to relative position information representing a difference between a position of a ruled line, which is included in a table area contained in the input image and encompasses the management information, and a position of an outline point in an outline portion of the table area; and
an extraction device extracting the management information from the input image based on the position computed by said computation device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
said computation device obtains, as information about the outline portion of the table area, at least one of a reference size of the table area and a position of a reference point around an outline of the table area. -
3. The management information extraction apparatus according to claim 1, wherein
said computation device obtains, as information about the outline portion of the table area, positions of two or more reference points around an outline of the table area, and computes the position of the management information according to position information relative to the two or more reference points. -
4. The management information extraction apparatus according to claim 1, wherein
said computation device computes the position of the management information using as a feature of a structure of ruled lines at least one or more pieces of position information about an intersection between two ruled lines, a state of the intersection between two ruled lines, the number of intersections contained in the input image, and a frequency of a rectangular cell of a specific form encompassed by ruled lines. -
5. The management information extraction apparatus according to claim 4, wherein
said computation device obtains the feature of the structure of the ruled lines after distinguishing a case in which a ruled line is a solid line from a case in which a ruled line is a broken line. -
6. The management information extraction apparatus according to claim 1, wherein
said computation device computes the position of the management information using reliability in extracting the ruled line as a feature of a structure of ruled lines. -
7. The management information extraction apparatus according to claim 1, wherein
said computation device computes the position of the management information using, as a feature of a structure of ruled lines, a ratio of two or more distances between a plurality of intersections arranged on the ruled line. -
8. The management information extraction apparatus according to claim 7, wherein
said computation device extracts a sequence of the plurality of intersections on ruled lines from around an outline of the table area, obtains a feature vector using the ratio of the distances as an element corresponding to each of the ruled lines, and represents a feature of a form of the outline of the table area using the feature vector. -
9. The management information extraction apparatus according to claim 1, wherein
said computation device obtains a feature of a form of an outline of the table area in at least one of four directions, that is, a right, left, upward, and downward directions from outside the input image, and computes the position of the management information using the feature of the form of the outline. -
10. The management information extraction apparatus according to claim 1, further comprising:
-
a dictionary device storing a feature of a structure of ruled lines of one or more table forms, and position information of management information in each table form; and
a comparison device comparing a feature of a structure of ruled lines of the input image with the feature of the structure of ruled lines stored in said dictionary device, wherein said computation device refers to the position information of the management information stored in said dictionary device based on a comparison result from said comparison device and computes the position of the management information of the input image.
-
-
11. The management information extraction apparatus according to claim 10, wherein
said comparison device limits candidates of table forms to be compared using the feature of the structure of ruled lines for rough classification, makes a comparison using the feature of the structure of ruled lines for detailed identification, and determines a table form corresponding to the input image. -
12. The management information extraction apparatus according to claim 11, wherein
said comparison device determines the table form corresponding to the input image by a dynamic programming matching process. -
13. The management information extraction apparatus according to claim 10, wherein
said dictionary device stores position information of a rectangular cell encompassing the management information as the position information of the management information in each table form. -
14. The management information extraction apparatus according to claim 13, wherein
said dictionary device stores one or more difference vectors between one or more vertexes of the rectangular cell and one or more vertexes of a table containing the rectangular cell as the position information of the rectangular cell. -
15. The management information extraction apparatus according to claim 14, wherein
said computation device obtains a stable vertex of the table area of the input image according to the comparison result, and computes the position of the management information of the input image using a difference vector from the stable vertex. -
16. The management information extraction apparatus according to claim 15, wherein
said dictionary device further stores a size of the rectangular cell; - and
said computation device computes the position of the management information of the input image from a rectangular cell which has a size corresponding to the size of the rectangular cell and is located near a position specified by the difference vector.
- and
-
17. The management information extraction apparatus according to claim 13, wherein
said dictionary device further stores a size of each table of the table forms; - and
said computation device computes a size ratio from a size of the table area of the input image and a size of a corresponding table in the dictionary device, and computes the position of the management information of the input image based on the size ratio.
- and
-
18. The management information extraction apparatus according to claim 10, wherein
said comparison device obtains a plurality of possible combinations of ruled lines extracted from the input image and corresponding ruled lines contained in information of said dictionary device, extracts a group of two or more compatible combinations among the plurality of combinations, and compares the form of the input image with each table form according to information about the combinations in the group.
-
-
19. A computer-readable storage medium used to direct a computer to perform:
-
computing a position of management information contained in an arbitrary input image according to relative position information representing a difference between a position of a ruled line, which is included in a table area contained in the input image and encompasses the management information, and a position of an outline point in an outline portion of the table area; and
extracting the management information from the input image based on the computed position.
-
-
20. A management information extracting method, comprising:
-
computing a position of management information contained in an arbitrary input image according to relative position information representing a difference between a position of a ruled line, which is included in a table area contained in the input image and encompasses the management information, and a position of an outline point in an outline portion of the table area; and
extracting the management information from the input image based on the computed position.
-
-
21. A method for extracting management information from a document, comprising:
-
storing feature data of a structure of ruled lines of a table area contained in a document;
specifying position information of management information of the document, the position information representing a difference between positions of the ruled lines and positions of putline points in outline portions of the table area;
storing the position information of the management information of the document;
comparing feature data of a structure of ruled lines of a table area contained in an input document with the stored feature data, wherein the input document written in table form is identified;
retrieving the stored position information based on the comparing of feature data; and
extracting automatically management information from the input document based on the retrieved specified position.
-
-
22. A management information extraction apparatus comprising:
-
a dictionary device storing relative position information representing a difference between a position of a ruled line, which is included in a table area contained in a table form, and a position of an outline point in an outline portion of the table area, and storing information of a size of the table area;
a computation device computing a size ratio from a size of a table area contained in an input image and the size of the table area stored in said dictionary device, and computing a position of management information in the input image according to the size ratio and the relative position information stored in said dictionary device; and
an extraction device extracting the management information from the input image based on the position computed by said computation device.
-
Specification