Detection and reconstruction of east asian layout features in a fixed format document
First Claim
1. A method for detecting Chinese, Japanese, or Korean text in a fixed format document, method comprising:
- receiving a fixed format document, the fixed document comprising one or more text runs on one or more pages;
analyzing the one or more text runs on a page for finding at least one Chinese, Japanese, or Korean character;
if at least one Chinese, Japanese, or Korean character is found on the page, analyzing the one or more text runs on the page for determining a text direction for the page, comprising;
analyzing the one or more text runs in a horizontal line and in a vertical line;
for each text run, determining if the text run fits a horizontal or a vertical sequence of text runs;
counting a number of characters in each horizontal text run and each vertical text run; and
if more characters are in the vertical text runs than in the horizontal text runs, determining the page comprises vertical text;
if the page comprises vertical text, rotating the page 90°
counterclockwise for layout analysis for reconstruction in a flow format document; and
reconstructing the fixed format document to a flow format document.
3 Assignments
0 Petitions
Accused Products
Abstract
Detection of East Asian layout features and reconstruction of East Asian layout features is provided. Vertically written text in the fixed format document is detected and rotated for layout analysis. After layout analysis, the rotated text is rotated back and restructured in a flow format document. When a plurality of characters is written horizontally in a vertical line of text, vertically overlapping text runs are detected, designated as horizontal-in-vertical text, and are restructured as horizontal-in-vertical text in a flow format document. Lines of text are analyzed for attributes of a ruby line and are designated as ruby text, associated with corresponding text in a ruby base line, and restructured as ruby text in a flow format document. Text in a fixed format document is analyzed for detection of a particular East Asian language so that a font for the language is designated in a flow format document.
-
Citations
20 Claims
-
1. A method for detecting Chinese, Japanese, or Korean text in a fixed format document, method comprising:
-
receiving a fixed format document, the fixed document comprising one or more text runs on one or more pages; analyzing the one or more text runs on a page for finding at least one Chinese, Japanese, or Korean character; if at least one Chinese, Japanese, or Korean character is found on the page, analyzing the one or more text runs on the page for determining a text direction for the page, comprising; analyzing the one or more text runs in a horizontal line and in a vertical line; for each text run, determining if the text run fits a horizontal or a vertical sequence of text runs; counting a number of characters in each horizontal text run and each vertical text run; and if more characters are in the vertical text runs than in the horizontal text runs, determining the page comprises vertical text; if the page comprises vertical text, rotating the page 90°
counterclockwise for layout analysis for reconstruction in a flow format document; andreconstructing the fixed format document to a flow format document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for detecting Chinese, Japanese, or Korean text in a fixed format document, the method comprising:
-
receiving a fixed format document, the fixed document comprising one or more text runs on one or more pages; analyzing the one or more text runs on a page for finding at least one Chinese, Japanese, or Korean character, wherein prior to analyzing the one or more text runs on a page for finding at least one Chinese, Japanese, or Korean character, separating a header or footer from a document body, the document body comprising the one or more text runs; if at least one Chinese, Japanese, or Korean character is found on the page, analyzing the one or more text runs on the page for determining a text direction for the page; if the page comprises vertical text, rotating the page 90°
counterclockwise for layout analysis for reconstruction in a flow format document; andreconstructing the fixed format document to a flow format document. - View Dependent Claims (11, 12, 13)
-
-
14. A method for detecting Chinese, Japanese, or Korean text in a fixed format document, the method comprising:
-
receiving a fixed format document, the fixed document comprising one or more text runs on one or more pages; analyzing the one or more text runs on a page for finding at least one Chinese, Japanese, or Korean character; if at least one Chinese, Japanese, or Korean character is found on the page, analyzing the one or more text runs on the page for determining a text direction for the page; and if the page comprises vertical text, rotating the page 90°
counterclockwise for layout analysis for reconstruction in a flow format document;after layout analysis is performed, translating the previously rotated text runs up along a vertical axis by page height, and rotating the previously rotated text runs 90°
clockwise; andreconstructing the fixed format document to a flow format document. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification