Document image generation apparatus, document image generation method and recording medium
First Claim
1. A document image generation apparatus that generates, on the basis of an image representing a document including plural lines, an image representing a supplementary annotation added document in which a supplementary annotation corresponding to a word or a phrase composed of plural words included in the document is added, comprising:
- an original document image obtaining component configured to obtain an original document image representing a document, wherein the original document image obtaining component is configured to obtain the original document image from a scanner, and wherein further the document is a text document;
a character recognizing component including a memory and processor configured to recognize a character included in the original document image obtained by the original document image obtaining component and identifies a position of the character in the original document image;
a supplementary annotation obtaining component including a memory and processor configured to determine a meaning of a word or a phrase included in the document constructed of a plurality of the recognized characters by the character recognizing component through a natural language processing performed on the document, and obtains a supplementary annotation corresponding to the meaning of each word or phrase;
a position determining component including a memory and processor configured to determine, as a position at which the obtained supplementary annotation corresponding to each word or phrase should be placed in a document, a position in an interline space near a word or a phrase in an original document image on the basis of a position of the character recognized by the character recognizing component, wherein the position determining component further comprises,a phrase judging component configured to judge whether a phrase for which a supplementary annotation is obtained is a discontinuous phrase in which plural words included in the phrase are discontinuously placed in the document; and
an annotation arrangement position determining component configured to determine, as a position at which the supplementary annotation should be placed in a document, a position in an interline space in the original document image near any one of a head word in a discontinuous phrase, a continuous word string included in the discontinuous phrase and the longest word in the discontinuous phrase, in the case that the phrase for which a supplementary annotation is obtained is the discontinuous phrase; and
an image generator including a memory and processor configured to generate an image representing a supplementary annotation added document by superimposing a supplementary annotation text layer on an original document image layer configured from an original document image, the supplementary annotation text layer including each supplementary annotation placed at a position corresponding to a position determined in the original document image by the position determining component.
1 Assignment
0 Petitions
Accused Products
Abstract
A character is recognized from an original document image that is obtained, for example, by an image reading apparatus. And a natural language processing is performed on a document configured from the recognized characters. Thus, a translation (supplementary annotation) for a word or a phrase in the document is obtained. Then, a supplementary annotation added document image is generated with an original document image layer configured from an original document image on which a supplementary annotation text layer is superimposed. In the supplementary annotation text layer, the translation is placed at a position corresponding to a position in an interline space near the word or the phrase. Furthermore, in addition to a translation, an underline is placed for a discontinuous phrase.
7 Citations
16 Claims
-
1. A document image generation apparatus that generates, on the basis of an image representing a document including plural lines, an image representing a supplementary annotation added document in which a supplementary annotation corresponding to a word or a phrase composed of plural words included in the document is added, comprising:
-
an original document image obtaining component configured to obtain an original document image representing a document, wherein the original document image obtaining component is configured to obtain the original document image from a scanner, and wherein further the document is a text document; a character recognizing component including a memory and processor configured to recognize a character included in the original document image obtained by the original document image obtaining component and identifies a position of the character in the original document image; a supplementary annotation obtaining component including a memory and processor configured to determine a meaning of a word or a phrase included in the document constructed of a plurality of the recognized characters by the character recognizing component through a natural language processing performed on the document, and obtains a supplementary annotation corresponding to the meaning of each word or phrase; a position determining component including a memory and processor configured to determine, as a position at which the obtained supplementary annotation corresponding to each word or phrase should be placed in a document, a position in an interline space near a word or a phrase in an original document image on the basis of a position of the character recognized by the character recognizing component, wherein the position determining component further comprises, a phrase judging component configured to judge whether a phrase for which a supplementary annotation is obtained is a discontinuous phrase in which plural words included in the phrase are discontinuously placed in the document; and an annotation arrangement position determining component configured to determine, as a position at which the supplementary annotation should be placed in a document, a position in an interline space in the original document image near any one of a head word in a discontinuous phrase, a continuous word string included in the discontinuous phrase and the longest word in the discontinuous phrase, in the case that the phrase for which a supplementary annotation is obtained is the discontinuous phrase; and an image generator including a memory and processor configured to generate an image representing a supplementary annotation added document by superimposing a supplementary annotation text layer on an original document image layer configured from an original document image, the supplementary annotation text layer including each supplementary annotation placed at a position corresponding to a position determined in the original document image by the position determining component. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A document image generation apparatus that generates, on the basis of an image representing a document including plural lines, an image representing a supplementary annotation added document in which a supplementary annotation corresponding to a word or a phrase composed of plural words included in the document is added, comprising:
-
a controller configured to obtain an original document image representing a document, wherein the controller is configured to obtain the original document image from a scanner, and wherein further the document is a text document, wherein the controller is further configured to; recognize a character included in the obtained document image and identifying a position of the character in the original document image; determine a meaning of a word or a phrase included in the document constructed of a plurality of the recognized characters through a natural language processing performed on the document, and obtain a supplementary annotation corresponding to the meaning of each word or phrase; determine, as a position at which the obtained supplementary annotation corresponding to each word or phrase should be placed in a document, a position in an interline space near a word or a phrase in an original document image on the basis of a position of the recognized character; generate an image representing a supplementary annotation added document by superimposing a supplementary annotation text layer on an original document image layer configured from an original document image, the supplementary annotation text layer including each supplementary annotation placed at a position corresponding to the determined position in the original document image; judge whether a phrase for which a supplementary annotation is obtained is a discontinuous phrase in which plural words included in the phrase are discontinuously placed in the document; and determine, as a position at which a supplementary annotation should be placed in a document, a position in an interline space in the original document image near any one of a head word in a discontinuous phrase, a continuous word string included in the discontinuous phrase and the longest word in the discontinuous phrase, in the case that the phrase for which a supplementary annotation is obtained is the discontinuous phrase. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A document image generation method for generating an image representing a supplementary annotation added document on the basis of an image representing a document including plural lines, in which a supplementary annotation corresponding to a word or a phrase composed of plural words included in the document is added, the method comprising:
-
obtaining, from a scanner, an original document image representing a document, wherein the document is a text document; recognizing a character included in the obtained document image and identifying a position of the character in the original document image; determining a meaning of word or a phrase included in the document through a natural language processing performed on the document composed of a plurality of the recognized characters, and obtaining a supplementary annotation corresponding to the meaning of each word or phrase; determining, as a position at which the obtained supplementary annotation corresponding to each word or phrase should be placed in a document, a position in an interline space near a word or phrase in an original document image on the basis of a position of the recognized character; generating an image representing a supplementary annotation added document by superimposing a supplementary annotation text layer on an original document image layer configured from an original document image, the supplementary annotation text layer including each supplementary annotation placed at a position corresponding to the position determined in the original document image at the step of determining a position; judging whether a phrase for which a supplementary annotation is obtained is a discontinuous phrase in which plural words included in the phrase are discontinuously placed in the document; and determining, as a position at which the supplementary annotation should be placed in a document, a position in an interline space in the original document image near any one of a head word in a discontinuous phrase, a continuous word string included in the discontinuous phrase and the longest word in the discontinuous phrase, in the case that the phrase for which a supplementary annotation is obtained is the discontinuous phrase.
-
-
16. A non-transitory computer-readable medium having computer-executable instructions embodied thereon for performing a method of processing for generating, on the basis of an image representing a document including plural lines, an image representing a supplementary annotation added document in which a supplementary annotation corresponding to a word or a phrase composed of plural words included in the document is added, the method comprising:
-
obtaining an original document image representing a document; recognizing a character included in the obtained document image and identifying a position of the character in the original document image; determining a meaning of a word or phrase included in the document through a natural language processing performed on the document composed of a plurality of the recognized characters, and obtaining a supplementary annotation corresponding to the meaning of each word or phrase; determining a position in an interline space near a word or phrase in an original document image on the basis of a position of the recognized character, as a position at which the obtained supplementary annotation corresponding to each word or phrase should be placed in a document; generating an image representing a supplementary annotation added document by superimposing a supplementary annotation text layer on an original document image layer configured from an original document image, the supplementary annotation text layer including each supplementary annotation placed at a position corresponding to the position in the original document image determined at the step of determining a position; judging whether a phrase for which a supplementary annotation is obtained is a discontinuous phrase in which plural words included in the phrase are discontinuously placed in the document; and determining, as a position at which the supplementary annotation should be placed in a document, a position in an interline space in the original document image near any one of a head word in a discontinuous phrase, a continuous word string included in the discontinuous phrase and the longest word in the discontinuous phrase, in the case that the phrase for which a supplementary annotation is obtained is the discontinuous phrase.
-
Specification