DOCUMENT PROCESSING APPARATUS, DOCUMENT PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT
First Claim
1. A document processing apparatus comprising:
- a document information obtaining unit that obtains document information created using at least two applications;
an image generating unit that generates a document image based on the document information;
an area dividing unit that divides the document information into areas for each of the applications;
a determining unit that determines whether a divided area is a character extractable area from which a character code can be extracted, for each of the areas;
a first character information extracting unit that extracts, for a first area that is an area determined to be the character extractable area, first character information from the area;
a second character information extracting unit that extracts, for a second area that is an area not determined to be the character extractable area, a character code by performing a character recognition processing on the document image as second character information; and
a storing unit that stores therein the first character information, the second character information, and at least one of the document information and the document image in association with each other.
1 Assignment
0 Petitions
Accused Products
Abstract
In a document processing apparatus, a first character information extracting unit extracts, for a first area that is an area determined to be a character extractable area in divided areas of a document information, first character information from the area; a second character information extracting unit extracts, for a second area that is an area not determined to be the character extractable area in the divided areas, a character code by performing a character recognition processing on a document image generated from the document information as second character information; and a storing unit stores therein the first character information, the second character information, and at least one of the document information and the document image in association with each other.
275 Citations
16 Claims
-
1. A document processing apparatus comprising:
-
a document information obtaining unit that obtains document information created using at least two applications; an image generating unit that generates a document image based on the document information; an area dividing unit that divides the document information into areas for each of the applications; a determining unit that determines whether a divided area is a character extractable area from which a character code can be extracted, for each of the areas; a first character information extracting unit that extracts, for a first area that is an area determined to be the character extractable area, first character information from the area; a second character information extracting unit that extracts, for a second area that is an area not determined to be the character extractable area, a character code by performing a character recognition processing on the document image as second character information; and a storing unit that stores therein the first character information, the second character information, and at least one of the document information and the document image in association with each other. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of processing a document comprising:
-
obtaining document information created using at least two applications; generating a document image based on the document information; dividing the document information into areas for each of the applications; determining whether a divided area is a character extractable area from which a character code can be extracted, for each of the areas; first extracting including extracting, for a first area that is an area determined to be the character extractable area, first character information from the area; second extracting including extracting, for a second area that is an area not determined to be the character extractable area, a character code by performing a character recognition processing on the document image as second character information; and storing therein the first character information, the second character information, and at least one of the document information and the document image in association with each other. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product comprising a computer-usable medium having computer-readable program codes embodied in the medium that when executed cause a computer to execute:
-
obtaining document information created using at least two applications; generating a document image based on the document information; dividing the document information into areas for each of the applications; determining whether a divided area is a character extractable area from which a character code can be extracted, for each of the areas; first extracting including extracting, for a first area that is an area determined to be the character extractable area, first character information from the area; second extracting including extracting, for a second area that is an area not determined to be the character extractable area, a character code by performing a character recognition processing on the document image as second character information; and storing therein the first character information, the second character information, and at least one of the document information and the document image in association with each other. - View Dependent Claims (16)
-
Specification