Document image generation apparatus, document image generation method and recording medium

US 8,503,786 B2
Filed: 11/05/2010
Issued: 08/06/2013
Est. Priority Date: 11/06/2009
Status: Active Grant

First Claim

Patent Images

1. A document image generation apparatus that generates, on the basis of an image representing a document including plural lines, an image representing a supplementary annotation added document in which a supplementary annotation corresponding to a word or a phrase composed of plural words included in the document is added, comprising:

an original document image obtaining component configured to obtain an original document image representing a document, wherein the original document image obtaining component is configured to obtain the original document image from a scanner, and wherein further the document is a text document;

a character recognizing component including a memory and processor configured to recognize a character included in the original document image obtained by the original document image obtaining component and identifies a position of the character in the original document image;

a supplementary annotation obtaining component including a memory and processor configured to determine a meaning of a word or a phrase included in the document constructed of a plurality of the recognized characters by the character recognizing component through a natural language processing performed on the document, and obtains a supplementary annotation corresponding to the meaning of each word or phrase;

a position determining component including a memory and processor configured to determine, as a position at which the obtained supplementary annotation corresponding to each word or phrase should be placed in a document, a position in an interline space near a word or a phrase in an original document image on the basis of a position of the character recognized by the character recognizing component, wherein the position determining component further comprises,a phrase judging component configured to judge whether a phrase for which a supplementary annotation is obtained is a discontinuous phrase in which plural words included in the phrase are discontinuously placed in the document; and

an annotation arrangement position determining component configured to determine, as a position at which the supplementary annotation should be placed in a document, a position in an interline space in the original document image near any one of a head word in a discontinuous phrase, a continuous word string included in the discontinuous phrase and the longest word in the discontinuous phrase, in the case that the phrase for which a supplementary annotation is obtained is the discontinuous phrase; and

an image generator including a memory and processor configured to generate an image representing a supplementary annotation added document by superimposing a supplementary annotation text layer on an original document image layer configured from an original document image, the supplementary annotation text layer including each supplementary annotation placed at a position corresponding to a position determined in the original document image by the position determining component.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A character is recognized from an original document image that is obtained, for example, by an image reading apparatus. And a natural language processing is performed on a document configured from the recognized characters. Thus, a translation (supplementary annotation) for a word or a phrase in the document is obtained. Then, a supplementary annotation added document image is generated with an original document image layer configured from an original document image on which a supplementary annotation text layer is superimposed. In the supplementary annotation text layer, the translation is placed at a position corresponding to a position in an interline space near the word or the phrase. Furthermore, in addition to a translation, an underline is placed for a discontinuous phrase.

7 Citations

View as Search Results

16 Claims

1. A document image generation apparatus that generates, on the basis of an image representing a document including plural lines, an image representing a supplementary annotation added document in which a supplementary annotation corresponding to a word or a phrase composed of plural words included in the document is added, comprising:
- an original document image obtaining component configured to obtain an original document image representing a document, wherein the original document image obtaining component is configured to obtain the original document image from a scanner, and wherein further the document is a text document;
  
  a character recognizing component including a memory and processor configured to recognize a character included in the original document image obtained by the original document image obtaining component and identifies a position of the character in the original document image;
  
  a supplementary annotation obtaining component including a memory and processor configured to determine a meaning of a word or a phrase included in the document constructed of a plurality of the recognized characters by the character recognizing component through a natural language processing performed on the document, and obtains a supplementary annotation corresponding to the meaning of each word or phrase;
  
  a position determining component including a memory and processor configured to determine, as a position at which the obtained supplementary annotation corresponding to each word or phrase should be placed in a document, a position in an interline space near a word or a phrase in an original document image on the basis of a position of the character recognized by the character recognizing component, wherein the position determining component further comprises,a phrase judging component configured to judge whether a phrase for which a supplementary annotation is obtained is a discontinuous phrase in which plural words included in the phrase are discontinuously placed in the document; and
  
  an annotation arrangement position determining component configured to determine, as a position at which the supplementary annotation should be placed in a document, a position in an interline space in the original document image near any one of a head word in a discontinuous phrase, a continuous word string included in the discontinuous phrase and the longest word in the discontinuous phrase, in the case that the phrase for which a supplementary annotation is obtained is the discontinuous phrase; and
  
  an image generator including a memory and processor configured to generate an image representing a supplementary annotation added document by superimposing a supplementary annotation text layer on an original document image layer configured from an original document image, the supplementary annotation text layer including each supplementary annotation placed at a position corresponding to a position determined in the original document image by the position determining component.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The document image generation apparatus according to claim 1, whereinthe image generator further superimposes a marked image layer on the original document image layer, andthe marked image layer is configured from an image in which a mark indicating a discontinuous phrase is placed at a position corresponding to the position of a discontinuous phrase in the original document image.
  - 3. The document image generation apparatus according to claim 1, further comprising:
    - an annotation addition object selector configured to select a word or a phrase at which a supplementary annotation should be placed, from words or phrases included in the document, whereinthe position determining component further comprises;
      
      an annotation length judging component configured to judge whether or not a length of a supplementary annotation is longer than a length of the word or phrase corresponding to the supplementary annotation;
      
      an object front-back annotation addition judging component configured to judge whether or not another supplementary annotation should be placed at a word in front of or behind the word or phrase, in the case that the length of the supplementary annotation is longer than the length of the word or phrase corresponding to the supplementary annotation;
      
      an annotation expended position determining component configured to determine, as a the position at which the supplementary annotation should be placed in the document, a position including a position in an interline space near the word or the phrase corresponding to the supplementary annotation and a position in an interline space near the word, either of the words in front of or behind the word or the phrase corresponding to the supplementary annotation, at which another supplementary annotation should not be placed, in the case that another supplementary annotation should not be placed at either one or both of the words in front of and behind the word or the phrase;
      
      a front-back annotation length judging component configured to judge whether or not a length of said another supplementary annotation is shorter than a length which is obtained by subtraction of a predetermined length from a length of a word, either of the words in front of and behind the word or the phrase corresponding to the supplementary annotation, at which said another supplementary annotation should be placed, in the case that said another supplementary annotation should be placed at either one or both of the words in front of and behind the word or the phrase corresponding to the supplementary annotation;
      
      an annotation partially expended position determining component configured to determine, as a position at which the supplementary annotation should be placed in the document, a position including a position in an interline space near the word or the phrase corresponding to the supplementary annotation and a part of a position in the interline space near a word, either of the words in front of and behind the word and the phrase, at which another supplementary annotation should be placed and whose length minus a predetermined length is longer than the length of said another supplementary annotation, in the case that the length of said another supplementary annotation is shorter than a length which is obtained by subtraction of the predetermined length from the length of the word at which said another supplementary annotation should be placed; and
      
      an annotation reduction rate calculator configured to calculate a reduction rate for reducing a length of a supplementary annotation which is longer than a length of a character string that can be placed at a position determined to be a position for placing the supplementary annotation in the documents.
  - 4. The document image generation apparatus according to claim 1, wherein the image generator is configured to superimpose an original document text layer on the original document image layer, and in the original document text layer, text data indicating each character in the original document image is placed in a transparent state at a position corresponding to each character in the original document image.
  - 5. The document image generation apparatus according to claim 1, wherein the supplementary annotation obtaining component is configured to obtain a translation for a word or a phrase, a reading for the word or the phrase or an annotation for the word or the phrase as the supplementary annotation.
  - 6. The document image generation apparatus according to claim 1, wherein the supplementary annotation obtaining component is configured to perform a natural language processing on contents of a document configured from character strings of each line connected with one another in order of lines, the character strings are configured from characters recognized by the character recognizing component.
  - 7. The document image generation apparatus according to claim 1, further comprising:
    - a receiver configured to receive data for a web page sent from an external apparatus; and
      
      a display configured to display a web page based on the data received by the receiver, whereinthe original document image obtaining component comprises a web-page obtaining component configured to obtain a web page as an original document image.

8. A document image generation apparatus that generates, on the basis of an image representing a document including plural lines, an image representing a supplementary annotation added document in which a supplementary annotation corresponding to a word or a phrase composed of plural words included in the document is added, comprising:
- a controller configured to obtain an original document image representing a document, wherein the controller is configured to obtain the original document image from a scanner, and wherein further the document is a text document, wherein the controller is further configured to;
  
  recognize a character included in the obtained document image and identifying a position of the character in the original document image;
  
  determine a meaning of a word or a phrase included in the document constructed of a plurality of the recognized characters through a natural language processing performed on the document, and obtain a supplementary annotation corresponding to the meaning of each word or phrase;
  
  determine, as a position at which the obtained supplementary annotation corresponding to each word or phrase should be placed in a document, a position in an interline space near a word or a phrase in an original document image on the basis of a position of the recognized character;
  
  generate an image representing a supplementary annotation added document by superimposing a supplementary annotation text layer on an original document image layer configured from an original document image, the supplementary annotation text layer including each supplementary annotation placed at a position corresponding to the determined position in the original document image;
  
  judge whether a phrase for which a supplementary annotation is obtained is a discontinuous phrase in which plural words included in the phrase are discontinuously placed in the document; and
  
  determine, as a position at which a supplementary annotation should be placed in a document, a position in an interline space in the original document image near any one of a head word in a discontinuous phrase, a continuous word string included in the discontinuous phrase and the longest word in the discontinuous phrase, in the case that the phrase for which a supplementary annotation is obtained is the discontinuous phrase.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The document image generation apparatus according to claim 8, whereinthe controller is further configured to superimpose a marked image layer on the original document image layer, andthe marked image layer is configured from an image in which a mark indicating a discontinuous phrase is placed at a position corresponding to the position of a discontinuous phrase in the original document image.
  - 10. The document image generation apparatus according to claim 8, wherein the controller is further configured to:
    - select a word or a phrase at which a supplementary annotation should be placed, from words or phrases included in the document;
      
      judge whether or not a length of a supplementary annotation is longer than a length of the word or phrase corresponding to the supplementary annotation;
      
      judge whether or not another supplementary annotation should be placed at a word in front of or behind the word or phrase, in the case that the length of the supplementary annotation is longer than the length of the word or phrase corresponding to the supplementary annotation;
      
      determine as a position at which the supplementary annotation should be placed in the document, a position including a position in an interline space near the word or the phrase corresponding to the supplementary annotation and a position in an interline space near the either one of the words in front of and behind the word or the phrase corresponding to the supplementary annotation at which another supplementary annotation should not be placed, in the case that another supplementary annotation should not be placed at either one or both of the words in front of and behind the word or the phrase;
      
      judge whether or not a length of said another supplementary annotation is shorter than a length which is obtained by subtraction of a predetermined length from a length of a word, either of the words in front of and behind the word or the phrase corresponding to the supplementary annotation, at which said another supplementary annotation should be placed, in the case that said another supplementary annotation should be placed at either one or both of the words in front of and behind the word or the phrase corresponding to the supplementary annotation;
      
      determine, as a position at which the supplementary annotation should be placed in the document, a position including a position in an interline space near the word or the phrase corresponding to the supplementary annotation and a part of a position in an interline space near a word, either of the words in front of and behind the word and the phrase, at which another supplementary annotation should be placed and whose length minus the predetermined length is longer than the length of said another supplementary annotation, in the case that the length of said another supplementary annotation is shorter than a length which is obtained by subtraction of a predetermined length from the length of the word at which another supplementary annotation should be placed; and
      
      calculate a reduction rate for reducing a length of a supplementary annotation which is longer than a length of a character string that can be placed at a position determined to be a position for placing the supplementary annotation in the documents.
  - 11. The document image generation apparatus according to claim 8, wherein the controller is further configured to superimpose an original document text layer on the original document image layer, and in the original document text layer, text data indicating each character in the original document image is placed in a transparent state at a position corresponding to each character in the original document image.
  - 12. The document image generation apparatus according to claim 8, wherein the controller is further configured to obtain a translation for a word or a phrase, a reading for the word or the phrase or an annotation for the word or the phrase as the supplementary annotation.
  - 13. The document image generation apparatus according to claim 8, wherein the controller is further configured to perform a natural language processing on contents of a document configured from character strings of each line connected with one another in order of lines, and the character strings are configured from the recognized characters.
  - 14. The document image generation apparatus according to claim 8, further comprising:
    - a receiver configured to receive data for a web page sent from an external apparatus; and
      
      a display configured to display a web page based on the data received by the receiver, whereinthe controller is further configured to obtain a web page as an original document image.

15. A document image generation method for generating an image representing a supplementary annotation added document on the basis of an image representing a document including plural lines, in which a supplementary annotation corresponding to a word or a phrase composed of plural words included in the document is added, the method comprising:
- obtaining, from a scanner, an original document image representing a document, wherein the document is a text document;
  
  recognizing a character included in the obtained document image and identifying a position of the character in the original document image;
  
  determining a meaning of word or a phrase included in the document through a natural language processing performed on the document composed of a plurality of the recognized characters, and obtaining a supplementary annotation corresponding to the meaning of each word or phrase;
  
  determining, as a position at which the obtained supplementary annotation corresponding to each word or phrase should be placed in a document, a position in an interline space near a word or phrase in an original document image on the basis of a position of the recognized character;
  
  generating an image representing a supplementary annotation added document by superimposing a supplementary annotation text layer on an original document image layer configured from an original document image, the supplementary annotation text layer including each supplementary annotation placed at a position corresponding to the position determined in the original document image at the step of determining a position;
  
  judging whether a phrase for which a supplementary annotation is obtained is a discontinuous phrase in which plural words included in the phrase are discontinuously placed in the document; and
  
  determining, as a position at which the supplementary annotation should be placed in a document, a position in an interline space in the original document image near any one of a head word in a discontinuous phrase, a continuous word string included in the discontinuous phrase and the longest word in the discontinuous phrase, in the case that the phrase for which a supplementary annotation is obtained is the discontinuous phrase.

16. A non-transitory computer-readable medium having computer-executable instructions embodied thereon for performing a method of processing for generating, on the basis of an image representing a document including plural lines, an image representing a supplementary annotation added document in which a supplementary annotation corresponding to a word or a phrase composed of plural words included in the document is added, the method comprising:
- obtaining an original document image representing a document;
  
  recognizing a character included in the obtained document image and identifying a position of the character in the original document image;
  
  determining a meaning of a word or phrase included in the document through a natural language processing performed on the document composed of a plurality of the recognized characters, and obtaining a supplementary annotation corresponding to the meaning of each word or phrase;
  
  determining a position in an interline space near a word or phrase in an original document image on the basis of a position of the recognized character, as a position at which the obtained supplementary annotation corresponding to each word or phrase should be placed in a document;
  
  generating an image representing a supplementary annotation added document by superimposing a supplementary annotation text layer on an original document image layer configured from an original document image, the supplementary annotation text layer including each supplementary annotation placed at a position corresponding to the position in the original document image determined at the step of determining a position;
  
  judging whether a phrase for which a supplementary annotation is obtained is a discontinuous phrase in which plural words included in the phrase are discontinuously placed in the document; and
  
  determining, as a position at which the supplementary annotation should be placed in a document, a position in an interline space in the original document image near any one of a head word in a discontinuous phrase, a continuous word string included in the discontinuous phrase and the longest word in the discontinuous phrase, in the case that the phrase for which a supplementary annotation is obtained is the discontinuous phrase.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sharp Kabushiki Kaisha (Hon Hai Precision Industry Co., Ltd.)
Original Assignee
Sharp Kabushiki Kaisha (Hon Hai Precision Industry Co., Ltd.)
Inventors
Sata, Ichiko, Kutsumi, Takeshi
Primary Examiner(s)
Park, Edward

Application Number

US12/940,818
Publication Number

US 20110110599A1
Time in Patent Office

1,005 Days
Field of Search

382/181, 382/182, 382/189
US Class Current

382/181
CPC Class Codes

G06F 40/169   Annotation, e.g. comment da...

G06F 40/58   Use of machine translation,...

G06V 30/10   Character recognition

G06V 30/1444   Selective acquisition, loca...

G06V 30/262   using context analysis, e.g...

Document image generation apparatus, document image generation method and recording medium

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

7 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Document image generation apparatus, document image generation method and recording medium

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

7 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links