Method and apparatus for determining logical document structure
First Claim
Patent Images
1. A method for determining a logical order of a document, comprising:
- (a) assigning a page of the document to be a block having a width along a first direction and a length along a second direction perpendicular to the first direction;
the block having a plurality of layout objects arranged therein;
(b) identifying a first set of hypothetical cuts, substantially between layout object boundaries, that span the width of the block;
the first set of hypothetical cuts defining a set of sub-blocks with each sub-block having a width along the first direction and a length along the second direction;
(c) identifying a second set of hypothetical cuts, substantially between layout object boundaries, that span the length of sub-blocks in the set of sub-blocks;
(d) computing arrangement criteria of layout objects ordered according to the first and the second sets of hypothetical cuts;
(e) modifying cuts in the first and second sets of hypothetical cuts, using the computed arrangement criteria, to merge cuts that span two or more sub-blocks along the second direction by removing one cut in the first set of hypothetical cuts and combining two cuts in the second set of hypothetical cuts; and
(f) determining the logical order of the document using cuts between layout objects in the block remaining in the first and second sets of hypothetical cuts after performing (e).
6 Assignments
0 Petitions
Accused Products
Abstract
Methods are disclosed for recovering or determining logical structure of a document by assessing different combinations of vertical and horizontal cuts across a block of the document. The block is segmented using a scoring function that discards horizontal cuts in favor of vertical cuts shared among neighboring sub-blocks. The order in which the blocks and sub-blocks are segmented is then used to define the logical structure of the document, such as its reading order.
74 Citations
23 Claims
-
1. A method for determining a logical order of a document, comprising:
-
(a) assigning a page of the document to be a block having a width along a first direction and a length along a second direction perpendicular to the first direction;
the block having a plurality of layout objects arranged therein;(b) identifying a first set of hypothetical cuts, substantially between layout object boundaries, that span the width of the block;
the first set of hypothetical cuts defining a set of sub-blocks with each sub-block having a width along the first direction and a length along the second direction;(c) identifying a second set of hypothetical cuts, substantially between layout object boundaries, that span the length of sub-blocks in the set of sub-blocks; (d) computing arrangement criteria of layout objects ordered according to the first and the second sets of hypothetical cuts; (e) modifying cuts in the first and second sets of hypothetical cuts, using the computed arrangement criteria, to merge cuts that span two or more sub-blocks along the second direction by removing one cut in the first set of hypothetical cuts and combining two cuts in the second set of hypothetical cuts; and (f) determining the logical order of the document using cuts between layout objects in the block remaining in the first and second sets of hypothetical cuts after performing (e). - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. An apparatus for determining a logical order of a document, comprising:
-
a memory for storing processing instructions of the apparatus; and a processor coupled to the memory for executing the processing instructions of the apparatus;
the processor in executing the processing instructions;(a) assigning a page of the document to be a block having a width along a first direction and a length along a second direction perpendicular to the first direction;
the block having a plurality of layout objects arranged therein;(b) identifying a first set of hypothetical cuts, substantially between layout object boundaries, that only span the entire width of the block, wherein any hypothetical cut in the first direction that is less than the entire width of the block is excluded;
the first set of hypothetical cuts defining a set of sub-blocks with each sub-block having a width along the first direction and a length along the second direction;(c) identifying a second set of hypothetical cuts, substantially between layout object boundaries, that span the length of sub-blocks in the set of sub-blocks; (d) computing arrangement criteria of layout objects ordered according to the first and the second sets of hypothetical cuts; (e) modifying cuts in the first and second sets of hypothetical cuts, using the computed arrangement criteria, to merge cuts that span two or more sub-blocks along the second direction; (f) determining the logical order of the document using cuts between layout objects in the block remaining in the first and second sets of hypothetical cuts after performing (e). - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
-
23. An article of manufacture for determining a logical order of a document, the article of manufacture comprising computer usable media including computer readable instructions embedded therein that causes a computer to perform a method wherein the method comprises:
-
(a) assigning a page of the document to be a block having a width along a first direction and a length along a second direction perpendicular to the first direction;
the block having a plurality of layout objects arranged therein;(b) identifying a first set of hypothetical cuts, substantially between layout object boundaries, that span the width of the block;
the first set of hypothetical cuts defining a set of sub-blocks with each sub-block having a width along the first direction and a length along the second direction;(c) identifying a second set of hypothetical cuts, substantially between layout object boundaries, that span the length of sub-blocks in the set of sub-blocks; (d) computing arrangement criteria of layout objects ordered according to the first and the second sets of hypothetical cuts; (e) modifying cuts in the first and second sets of hypothetical cuts, using the computed arrangement criteria, to merge cuts that span two or more sub-blocks along the second direction by removing one cut in the first set of hypothetical cuts and combining two cuts in the second set of hypothetical cuts; and (f) determining the logical order of the document using cuts between layout objects in the block remaining in the first and second sets of hypothetical cuts after performing (e).
-
Specification