Document processing apparatus, document processing method, and computer program product
First Claim
1. A document processing apparatus comprising:
- a document information obtaining unit that obtains document information created using at least two applications;
an image generating unit that generates a document image based on the document information;
an area dividing unit that divides the document information into areas for each of the applications;
a determining unit that determines whether a divided area is a character extractable area from which a character code can be extracted, for each of the areas;
a first character information extracting unit that extracts, for a first area that is an area determined to be the character extractable area, first character information from the area;
a second character information extracting unit that extracts, for a second area that is an area not determined to be the character extractable area, a character code by performing a character recognition processing on the document image as second character information; and
a storing unit that stores therein the first character information, the second character information, and at least one of the document information and the document image in association with each other.
1 Assignment
0 Petitions
Accused Products
Abstract
In a document processing apparatus, a first character information extracting unit extracts, for a first area that is an area determined to be a character extractable area in divided areas of a document information, first character information from the area; a second character information extracting unit extracts, for a second area that is an area not determined to be the character extractable area in the divided areas, a character code by performing a character recognition processing on a document image generated from the document information as second character information; and a storing unit stores therein the first character information, the second character information, and at least one of the document information and the document image in association with each other.
-
Citations
16 Claims
-
1. A document processing apparatus comprising:
-
a document information obtaining unit that obtains document information created using at least two applications; an image generating unit that generates a document image based on the document information; an area dividing unit that divides the document information into areas for each of the applications; a determining unit that determines whether a divided area is a character extractable area from which a character code can be extracted, for each of the areas; a first character information extracting unit that extracts, for a first area that is an area determined to be the character extractable area, first character information from the area; a second character information extracting unit that extracts, for a second area that is an area not determined to be the character extractable area, a character code by performing a character recognition processing on the document image as second character information; and a storing unit that stores therein the first character information, the second character information, and at least one of the document information and the document image in association with each other. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of processing a document comprising:
-
obtaining document information created using at least two applications; generating a document image based on the document information; dividing the document information into areas for each of the applications; determining whether a divided area is a character extractable area from which a character code can be extracted, for each of the areas; first extracting including extracting, for a first area that is an area determined to be the character extractable area, first character information from the area; second extracting including extracting, for a second area that is an area not determined to be the character extractable area, a character code by performing a character recognition processing on the document image as second character information; and storing the first character information, the second character information, and at least one of the document information and the document image in association with each other. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product comprising a non-transitory computer-usable medium having computer-readable program codes embodied in the medium that when executed cause a computer to execute:
-
obtaining document information created using at least two applications; generating a document image based on the document information; dividing the document information into areas for each of the applications; determining whether a divided area is a character extractable area from which a character code can be extracted, for each of the areas; first extracting including extracting, for a first area that is an area determined to be the character extractable area, first character information from the area; second extracting including extracting, for a second area that is an area not determined to be the character extractable area, a character code by performing a character recognition processing on the document image as second character information; and storing the first character information, the second character information, and at least one of the document information and the document image in association with each other. - View Dependent Claims (16)
-
Specification