×

Apparatus and method of analyzing layout of document, and computer product

  • US 7,257,253 B2
  • Filed: 01/24/2003
  • Issued: 08/14/2007
  • Est. Priority Date: 06/28/2002
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-readable medium storing instructions for analyzing a layout of a document, which, when executed by a computer, causes the computer to perform operations comprising:

  • extracting continuous black pixels as black pixel linkage components based on data for an image of the document;

    setting a circumscribed rectangle for each of the black pixel components, the circumscribed rectangle being circumscribed to each of the black pixel components and used as a character candidate;

    classifying character sizes for each of the circumscribed rectangles into three categories of large, standard and small based on a value of a long side thereof;

    integrating a first circumscribed rectangle having a small character size with a second circumscribed rectangle having a different character size from the first circumscribed rectangle when the first circumscribed rectangle and the second circumscribed rectangle overlap each other, and when a circumscribed rectangle formed from the first circumscribed rectangle and the second circumscribed rectangle is determined to be approximately square;

    selecting two circumscribed rectangles having a shortest Euclidian distance between barycenters of the two circumscribed rectangles from a group of circumscribed rectangles;

    setting integration of the two selected circumscribed rectangles as a character candidate element when the integration of the two circumscribed rectangles is determined to be approximately square;

    extracting character candidate elements from the black pixel linkage components;

    extracting a plurality of the character candidate elements as a line element, among character candidate elements aligned in line orientation, each amount of displacement of the extracted character candidate elements in orientation perpendicular to the line orientation being smaller than or equal to a threshold value;

    generating a line rectangle as a line candidate in the line orientation based on the extracted character candidate elements; and

    segmenting the line rectangle into two line rectangles, in response to the line rectangle overlapping another line rectangle, before and after the another line rectangle.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×