FORM RECOGNITION APPARATUS, METHOD, DATABASE GENERATION APPARATUS, METHOD, AND STORAGE MEDIUM
First Claim
1. A form recognition apparatus for recognizing a character string existing in an arbitrary table structure in a form, comprising:
- an image acquisition unit capable of for obtaining a digitized form image of the form;
a character string recognition unit capable of for recognizing a character string existing in the form image obtained by the image acquisition unit;
a character string extraction unit capable of for extracting a headline wording being a predetermined character string from character strings recognized by the character string recognition unit;
a table structure determination unit capable of for determining a table structure existing in the form image, on the basis of the headline wording extracted by the character string extraction unit and arrangement of the headline wordings in the form image; and
a correspondence relationship specification unit capable of for specifying a correspondence relationship between the headline wording and a character string other than the headline wording, recognized by the character string recognition unit, using a determination result of the table structure by the table structure determination unit.
1 Assignment
0 Petitions
Accused Products
Abstract
One system to which the present invention is applied obtains the digitized form image of a form, recognizes a character string existing in the obtained form image, extracts a headline wording being a predetermined character string from the recognized character strings, determines a table structure existing in the form image, on the basis of the extracted headline wording and the arrangement of headline wordings in the form image and specifies a correspondence relationship between a headline wording and a character string other than the headline wording that is recognized, using the determination result.
27 Citations
33 Claims
-
1. A form recognition apparatus for recognizing a character string existing in an arbitrary table structure in a form, comprising:
-
an image acquisition unit capable of for obtaining a digitized form image of the form; a character string recognition unit capable of for recognizing a character string existing in the form image obtained by the image acquisition unit; a character string extraction unit capable of for extracting a headline wording being a predetermined character string from character strings recognized by the character string recognition unit; a table structure determination unit capable of for determining a table structure existing in the form image, on the basis of the headline wording extracted by the character string extraction unit and arrangement of the headline wordings in the form image; and a correspondence relationship specification unit capable of for specifying a correspondence relationship between the headline wording and a character string other than the headline wording, recognized by the character string recognition unit, using a determination result of the table structure by the table structure determination unit. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A form recognition apparatus for recognizing a character string existing in an arbitrary table structure in a form, comprising:
-
an image acquisition unit capable of for obtaining a digitized form image of the form; a character string recognition unit capable of for recognizing a character string existing in the form image obtained by the image acquisition unit; a character string extraction unit capable of for extracting a headline wording being a predetermined character string from character strings recognized by the character string recognition unit; a position specification unit capable of for specifying a position in the form image, in which a headline wording not recognized by the character string recognition unit, on the basis of a result extracted by the character string extraction unit; a phrase creation unit capable of for creating a headline wording to existing a position specified by the position specification unit; and a correspondence relationship specification unit capable of for specifying a correspondence relationship between the headline wording including a headline wording created by the phrase creation unit and a character string other than the headline wording, recognized by the character string recognition unit. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A form recognition apparatus for recognizing a character string existing in an arbitrary table structure in a form, comprising:
-
an image acquisition unit capable of for obtaining a digitized form image of the form; a storage unit capable of for storing a database in which headline wordings that may appear in a unit table structure being a unit for entering one or more pieces of related data into the form are defined in a hierarchical structure for each unit table structure; a character string recognition unit capable of for recognizing a character string existing in the form image obtained by the image acquisition unit; a character string extraction unit capable of for extracting a headline wording being a predetermined character string being a predetermined character string from character strings recognized by the character string recognition unit, referring to a database stored in the storage unit; a headline addition unit capable of for extracting a range of the unit table structure existing in the form image, on the basis of an extraction result by the character string extraction unit and adding a headline wording corresponding to data, focusing on at least one of the headline wording recognized in the extracted range and a character string recognized as data in the unit table structure; and a correspondence relationship specification unit capable of for specifying a correspondence relationship between the headline wording including a headline wording added by the headline addition unit and a character string other than the headline wording, recognized by the character string recognition unit. - View Dependent Claims (19, 20)
-
-
21. A form recognition apparatus for recognizing a character string existing in an arbitrary table structure in a form, comprising:
-
an image acquisition unit capable of for obtaining a digitized form image of the form; a character string recognition unit capable of for recognizing a character string existing in the form image obtained by the image acquisition unit; a character string extraction unit capable of for extracting a headline wording being a predetermined character string being a predetermined character string from character strings recognized by the character string recognition unit; a table structure determination unit capable of for determining a table structure existing in the form image, on the basis of a headline wording extracted by the character string extraction unit and arrangement of the headline wordings in the form image; a position specification unit capable of for specifying a position in the form image, in which a headline wording not recognized by the character string recognition unit exists, on the basis of a result extracted by the character string extraction unit; a phrase creation unit capable of for creating a headline wording to exist in a position specified by the position specification unit; a headline addition unit capable of for extracting a range of the unit table structure existing in the form image, on the basis of an extraction result by the character string extraction unit and adding a headline wording corresponding to data, focusing on at least one of the headline wording recognized in the extracted range and a character string recognized as data in the unit table structure; and a correspondence relationship specification unit capable of for specifying a correspondence relationship between the headline wording, including a headline wording added by the headline addition unit and a character string other than the headline wording, recognized by the character string recognition unit. - View Dependent Claims (22)
-
-
23. A form recognition method for recognizing a character string existing in an arbitrary table structure in a form, comprising:
-
obtaining a digitized form image of the form; recognizing a character string existing in the form image obtained by the image acquisition unit; extracting a headline wording being a predetermined character string from character strings recognized by the character string recognition process; determining a table structure existing in the form image, on the basis of a headline wording extracted by the character string extraction process and arrangement of the headline wordings in the form image; and specifying a correspondence relationship between the headline wording and a character string other than the headline wording, recognized by the character string recognition process, using a determination result of the table structure by the table structure determination process.
-
-
24. A form recognition method for recognizing a character string existing in an arbitrary table structure in a form, comprising:
-
obtaining a digitized form image of the form; recognizing a character string existing in the form image obtained by the image acquisition process; extracting a headline wording being a predetermined character string from character strings recognized by the character string recognition process; specifying a position in the form image, in which a headline wording not recognized by the character string recognition process, on the basis of a result extracted by the character string extraction process; creating a headline wording to existing a position specified by the position specification process; and specifying a correspondence relationship between the headline wording, including a headline wording created by the phrase creation process and a character string other than the headline wording, recognized by the character string recognition process.
-
-
25. A form recognition method for recognizing a character string existing in an arbitrary table structure in a form, comprising:
-
obtaining a digitized form image of the form; recognizing a character string existing in the form image obtained by the image acquisition process; extracting a headline wording being a predetermined character string from character strings recognized by the character string recognition process, referring to a database in which headline wordings that may appear in the unit table structure, for each unit structure being a table structure being a unit for entering one or more pieces of related data in the form; extracting a range of the unit table structure existing in the form image, on the basis of an extraction result by the character string extraction process and adding a headline wording corresponding to data, focusing on at least one of the headline wording recognized in the extracted range and a character string recognized as data in the unit table structure; and specifying a correspondence relationship between the headline wording, including a headline wording added by the headline addition process and a character string other than the headline wording, recognized by the character string recognition process.
-
-
26. A database creation support apparatus for supporting creation of a database that can be used to recognize a character string in a form by a form recognition apparatus comprising:
-
a phrase input unit capable of for inputting the headline wordings; and a hierarchical structure creation unit capable of for creating a hierarchical structure among headline wordings inputted by the phrase input unit. - View Dependent Claims (27, 28)
-
-
29. A database creation support method for supporting creation of a database that can be used to recognize a character string in the form by the form recognition apparatus comprising:
-
inputting headline wordings; and creating a hierarchical structure among headline wordings inputted by the phrase input process.
-
-
30. A computer-accessible storage medium on which is recorded a program for enabling a computer used as a form recognition apparatus for recognizing a character string existing in an arbitrary table structure in a form to realize functions, said functions comprising:
-
obtaining a digitized form image of the form; recognizing a character string existing in the form image obtained by the image acquisition function; extracting a headline wording being a predetermined character string from character strings recognized by the character string recognition function; determining a table structure existing in the form image, on the basis of a headline wording extracted by the character string extraction function and arrangement of the headline wordings in the form image; and specifying a correspondence relationship between the headline wording and a character string other than the headline wording, recognized by the character string recognition function, using a determination result of the table structure by the table structure determination function.
-
-
31. A computer-accessible storage medium on which is recorded a program for enabling a computer used as a form recognition apparatus for recognizing a character string existing in an arbitrary table structure in a form to realize functions, said functions comprising:
-
obtaining a digitized form image of the form; recognizing a character string existing in the form image obtained by the image acquisition function; extracting a headline wording being a predetermined character string from character strings recognized by the character string recognition function; specifying a position in the form image existing in a headline wording not recognized by the character string recognition function, on the basis of a result extracted by the character string extraction function; and specifying a correspondence relationship between the headline wording and a character string other than the headline wording, recognized by the character string recognition function, using a determination result of the table structure by the table structure determination function.
-
-
32. A computer-accessible storage medium on which is recorded a program for enabling a computer used as a form recognition apparatus for recognizing a character string existing in an arbitrary table structure in a form to realize functions, said functions comprising:
-
obtaining a digitized form image of the form; recognizing a character string existing in the form image obtained by the image acquisition function; extracting a headline wording being a predetermined character string from character strings recognized by the character string recognition function; extracting a range of the unit table structure existing in the form image, on the basis of an extraction result of the character string extraction function and adding a headline wording corresponding to the data, focusing on at least one of the headline wording recognized in the extracted range and a character string recognized as data in the unit table structure; and specifying a correspondence relationship between the headline wording, including a headline wording added by the headline addition function and a character string other than the headline wording, recognized by the character string recognition function.
-
-
33. A computer-accessible storage medium on which is recorded a program for enabling a computer used as a database creation support apparatus that can be used to recognize a character string in a form by a form recognition apparatus to realize functions, said functions comprising:
-
inputting the headline wordings; and creating a hierarchical structure among headline wordings inputted by the phrase input function.
-
Specification