IMAGE PROCESSING APPARATUS AND IMAGE PROCESSING METHOD
First Claim
1. An image processing apparatus comprising:
- an input unit configured to input a document including a plurality of page images;
a region segmentation unit configured to divide each page image input by the input unit into attribute regions;
a character recognition unit configured to execute character recognition processing on the regions divided by the region segmentation unit;
a first detection unit configured to detect a first anchor expression constituted by a specific character string from a result of the character recognition processing executed by the character recognition unit on a text attribute region in the page image;
a first identifier allocation unit configured to allocate a first link identifier to the first anchor expression detected by the first detection unit;
a first graphic data generation unit configured to generate graphic data to be used to identify the first anchor expression detected by the first detection unit and associate the generated graphic data with the first link identifier allocated by the first identifier allocation unit;
a first table updating unit configured to register the first link identifier and the first anchor expression in a link configuration management table while associating them with each other and, if an anchor expression similar to the first anchor expression is already registered in the link configuration management table, configured to update the link configuration management table in such a way as to mutually associate the link identifiers of the same anchor expression;
a second detection unit configured to detect a second anchor expression constituted by a specific character string from a result of the character recognition processing executed by the character recognition unit on a caption region accompanying an object in the page image;
a second identifier allocation unit configured to allocate a second link identifier to the object accompanied by the caption region where the second anchor expression is detected;
a second graphic data generation unit configured to generate graphic data to be used to identify the object accompanied by the caption region where the second anchor expression is detected and associate the generated graphic data with the second link identifier allocated by the second identifier allocation unit;
a second table updating unit configured to register the second link identifier and the second anchor expression in the link configuration management table while associating them with each other and, if an anchor expression similar to the second anchor expression is already registered in the link configuration management table, configured to update the link configuration management table in such a way as to mutually associate the link identifiers of the same anchor expression;
a page data generation unit configured to generate page data of an electronic document for the page image, using the first link identifier, the first graphic data, the second link identifier, and the second graphic data;
a first transmission unit configured to transmit the page data of the electronic document generated by the page data generation unit;
a control unit configured to successively designate each page of the page image input by the input unit as a processing target and control processing repetitively executed by the region segmentation unit, the character recognition unit, the first detection unit, the first identifier allocation unit, the first graphic data generation unit, the first table updating unit, the second detection unit, the second identifier allocation unit, the second graphic data generation unit, the second table updating unit, the page data generation unit, and the first transmission unit; and
a second transmission unit configured to generate link configuration information to be used to link the first link identifier with the second link identifier included in the electronic document based on the link configuration management table updated by the first table updating unit and the second table updating unit, and configured to transmit the generated link configuration information.
1 Assignment
0 Petitions
Accused Products
Abstract
An image processing apparatus successively designates each page of an input page image as a processing target, detects an anchor expression constituted by a specific character string, and associates a highlight position corresponding to the anchor expression with a link identifier. When the anchor expression and the link identifier are registered in a link configuration management table, if the same anchor expression is already registered in the table, the apparatus updates the table in such a way as to mutually associate the link identifiers of the same anchor expression. The apparatus generates page data of an electronic document based on a link identifier relating to a processing target page image and its highlight position and transmits the generated page data. The apparatus generates information usable to link the relevant link identifiers based on the link configuration management table, after completing the processing for all pages, and transmits the generated information.
-
Citations
11 Claims
-
1. An image processing apparatus comprising:
-
an input unit configured to input a document including a plurality of page images; a region segmentation unit configured to divide each page image input by the input unit into attribute regions; a character recognition unit configured to execute character recognition processing on the regions divided by the region segmentation unit; a first detection unit configured to detect a first anchor expression constituted by a specific character string from a result of the character recognition processing executed by the character recognition unit on a text attribute region in the page image; a first identifier allocation unit configured to allocate a first link identifier to the first anchor expression detected by the first detection unit; a first graphic data generation unit configured to generate graphic data to be used to identify the first anchor expression detected by the first detection unit and associate the generated graphic data with the first link identifier allocated by the first identifier allocation unit; a first table updating unit configured to register the first link identifier and the first anchor expression in a link configuration management table while associating them with each other and, if an anchor expression similar to the first anchor expression is already registered in the link configuration management table, configured to update the link configuration management table in such a way as to mutually associate the link identifiers of the same anchor expression; a second detection unit configured to detect a second anchor expression constituted by a specific character string from a result of the character recognition processing executed by the character recognition unit on a caption region accompanying an object in the page image; a second identifier allocation unit configured to allocate a second link identifier to the object accompanied by the caption region where the second anchor expression is detected; a second graphic data generation unit configured to generate graphic data to be used to identify the object accompanied by the caption region where the second anchor expression is detected and associate the generated graphic data with the second link identifier allocated by the second identifier allocation unit; a second table updating unit configured to register the second link identifier and the second anchor expression in the link configuration management table while associating them with each other and, if an anchor expression similar to the second anchor expression is already registered in the link configuration management table, configured to update the link configuration management table in such a way as to mutually associate the link identifiers of the same anchor expression; a page data generation unit configured to generate page data of an electronic document for the page image, using the first link identifier, the first graphic data, the second link identifier, and the second graphic data; a first transmission unit configured to transmit the page data of the electronic document generated by the page data generation unit; a control unit configured to successively designate each page of the page image input by the input unit as a processing target and control processing repetitively executed by the region segmentation unit, the character recognition unit, the first detection unit, the first identifier allocation unit, the first graphic data generation unit, the first table updating unit, the second detection unit, the second identifier allocation unit, the second graphic data generation unit, the second table updating unit, the page data generation unit, and the first transmission unit; and a second transmission unit configured to generate link configuration information to be used to link the first link identifier with the second link identifier included in the electronic document based on the link configuration management table updated by the first table updating unit and the second table updating unit, and configured to transmit the generated link configuration information. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An image processing apparatus comprising:
-
an input unit configured to input a document including a plurality of page images; a region segmentation unit configured to divide each page image input by the input unit into attribute regions; a character recognition unit configured to execute character recognition processing on the regions divided by the region segmentation unit; a detection unit configured to detect an anchor expression constituted by a specific character string from a result of the character recognition processing executed by the character recognition unit; an identifier allocation unit configured to allocate a link identifier to the anchor expression detected by the detection unit; a generation unit configured to generate data that associates a highlight position to be determined based on the anchor expression with the link identifier; a table updating unit configured to register the anchor expression and the link identifier in a link configuration management table while associating them with each other and, if an anchor expression similar to the anchor expression is already registered in the link configuration management table, configured to update the link configuration management table in such a way as to mutually associate the link identifiers of the same anchor expression; a first transmission unit configured to generate page data of an electronic document for the page image, based on the link identifier and the highlight position, and transmit the generated page data; a control unit configured to successively designate each page of the page image input by the input unit as a processing target and control processing repetitively executed by the region segmentation unit, the character recognition unit, the detection unit, the identifier allocation unit, the generation unit, the table updating unit, and the first transmission unit; and a second transmission unit configured to generate link configuration information to be used to link the link identifiers included in the electronic document based on the link configuration management table updated by the table updating unit, and configured to transmit the generated link configuration information.
-
-
8. An image processing method comprising:
-
inputting a document including a plurality of page images; dividing each input page image into attribute regions; executing character recognition processing on the divided regions; detecting a first anchor expression constituted by a specific character string from a result of the character recognition processing executed on a text attribute region in the page image; allocating a first link identifier to the detected first anchor expression; generating graphic data to be used to identify the detected first anchor expression and associating the generated graphic data with the allocated first link identifier; registering the first link identifier and the first anchor expression in a link configuration management table while associating them with each other and, if an anchor expression similar to the first anchor expression is already registered in the link configuration management table, updating the link configuration management table in such a way as to mutually associate the link identifiers of the same anchor expression; detecting a second anchor expression constituted by a specific character string from a result of the character recognition processing executed on a caption region accompanying an object in the page image; allocating a second link identifier to the object accompanied by the caption region where the second anchor expression is detected; generating graphic data to be used to identify the object accompanied by the caption region where the second anchor expression is detected and associating the generated graphic data with the allocated second link identifier; registering the second link identifier and the second anchor expression in the link configuration management table while associating them with each other and, if an anchor expression similar to the second anchor expression is already registered in the link configuration management table, updating the link configuration management table in such a way as to mutually associate the link identifiers of the same anchor expression; generating page data of an electronic document for the page image, using the first link identifier, the first graphic data, the second link identifier, and the second graphic data; transmitting the generated page data of the electronic document; successively designating each page of the input page image as a processing target and controlling the region division processing, the character recognition processing, the first anchor expression detection processing, the first link identifier allocation processing, the first graphic data generation processing, the first table updating processing, the second anchor expression detection processing, the second link identifier allocation processing, the second graphic data generation processing, the second table updating processing, the page data generation processing, and the page data transmission processing, which are repetitively executed; and generating link configuration information to be used to link the first link identifier with the second link identifier included in the electronic document based on the updated link configuration management table, and transmitting the generated link configuration information.
-
-
9. An image processing method comprising:
-
inputting a document including a plurality of page images; dividing each page image input by the input unit into attribute regions; executing character recognition processing on the divided regions; detecting an anchor expression constituted by a specific character string from a result of the executed character recognition processing; allocating a link identifier to the detected anchor expression; generating data that associates a highlight position to be determined based on the anchor expression with the link identifier; registering the anchor expression and the link identifier in a link configuration management table while associating them with each other and, if an anchor expression similar to the anchor expression is already registered in the link configuration management table, updating the link configuration management table in such a way as to mutually associate the link identifiers of the same anchor expression; generating page data of an electronic document for the page image, based on the link identifier and the highlight position, and transmitting the generated page data; successively designating each input page of the page image as a processing target and controlling the region division processing, the character recognition processing, the anchor expression detection processing, the identifier allocation processing, the generation processing, the table updating processing, and the page data transmission processing, which are repetitively executed; and generating link configuration information to be used to link the link identifiers included in the electronic document based on the updated link configuration management table, and transmitting the generated link configuration information.
-
-
10. A non-transitory computer-readable storage medium that stores a computer program, in which the computer program comprises:
-
computer-executable instructions for causing an input unit to input a document including a plurality of page images; computer-executable instructions for causing a region segmentation unit to divide each page image input by the input unit into attribute regions; computer-executable instructions for causing a character recognition unit to execute character recognition processing on the regions divided by the region segmentation unit; computer-executable instructions for causing a first detection unit to detect a first anchor expression constituted by a specific character string from a result of the character recognition processing executed by the character recognition unit on a text attribute region in the page image; computer-executable instructions for causing a first identifier allocation unit to allocate a first link identifier to the first anchor expression detected by the first detection unit; computer-executable instructions for causing a first graphic data generation unit to generate graphic data to be used to identify the first anchor expression detected by the first detection unit and associate the generated graphic data with the first link identifier allocated by the first identifier allocation unit; computer-executable instructions for causing a first table updating unit to register the first link identifier and the first anchor expression in a link configuration management table while associating them with each other and, if an anchor expression similar to the first anchor expression is already registered in the link configuration management table, update the link configuration management table in such a way as to mutually associate the link identifiers of the same anchor expression; computer-executable instructions for causing a second detection unit to detect a second anchor expression constituted by a specific character string from a result of the character recognition processing executed by the character recognition unit on a caption region accompanying an object in the page image; computer-executable instructions for causing a second identifier allocation unit to allocate a second link identifier to the object accompanied by the caption region where the second anchor expression is detected; computer-executable instructions for causing a second graphic data generation unit to generate graphic data to be used to identify the object accompanied by the caption region where the second anchor expression is detected and associate the generated graphic data with the second link identifier allocated by the second identifier allocation unit; computer-executable instructions for causing a second table updating unit to register the second link identifier and the second anchor expression in the link configuration management table while associating them with each other and, if an anchor expression similar to the second anchor expression is already registered in the link configuration management table, update the link configuration management table in such a way as to mutually associate the link identifiers of the same anchor expression; computer-executable instructions for causing a page data generation unit to generate page data of an electronic document for the page image, using the first link identifier, the first graphic data, the second link identifier, and the second graphic data; computer-executable instructions for causing a first transmission unit to transmit the page data of the electronic document generated by the page data generation unit; computer-executable instructions for causing a control unit to successively designate each page of the page image input by the input unit as a processing target and control processing repetitively executed by the region segmentation unit, the character recognition unit, the first detection unit, the first identifier allocation unit, the first graphic data generation unit, the first table updating unit, the second detection unit, the second identifier allocation unit, the second graphic data generation unit, the second table updating unit, the page data generation unit, and the first transmission unit; and computer-executable instructions for causing a second transmission unit to generate link configuration information to be used to link the first link identifier with the second link identifier included in the electronic document based on the link configuration management table updated by the first table updating unit and the second table updating unit, and transmit the generated link configuration information.
-
-
11. A non-transitory computer-readable storage medium that stores a computer program, in which the computer program comprises:
-
computer-executable instructions for causing an input unit to input a document including a plurality of page images; computer-executable instructions for causing a region segmentation unit to divide each page image input by the input unit into attribute regions; computer-executable instructions for causing a character recognition unit to execute character recognition processing on the regions divided by the region segmentation unit; computer-executable instructions for causing a detection unit to detect an anchor expression constituted by a specific character string from a result of the character recognition processing executed by the character recognition unit; computer-executable instructions for causing an identifier allocation unit to allocate a link identifier to the anchor expression detected by the detection unit; computer-executable instructions for causing a generation unit to generate data that associates a highlight position to be determined based on the anchor expression with the link identifier; computer-executable instructions for causing a table updating unit to register the anchor expression and the link identifier in a link configuration management table while associating them with each other and, if an anchor expression similar to the anchor expression is already registered in the link configuration management table, update the link configuration management table in such a way as to mutually associate the link identifiers of the same anchor expression; computer-executable instructions for causing a first transmission unit to generate page data of an electronic document for the page image, based on the link identifier and the highlight position, and transmit the generated page data; computer-executable instructions for causing a control unit to successively designate each page of the page image input by the input unit as a processing target and control processing repetitively executed by the region segmentation unit, the character recognition unit, the detection unit, the identifier allocation unit, the generation unit, the table updating unit, and the first transmission unit; and computer-executable instructions for causing a second transmission unit to generate link configuration information to be used to link the link identifiers included in the electronic document based on the link configuration management table updated by the table updating unit, and transmit the generated link configuration information.
-
Specification