Method and apparatus for determining logical document structure
First Claim
Patent Images
1. A method for determining a logical order of a document, comprising:
- (a) assigning a page of the document to be a block having a width along a first direction and a length along a second direction perpendicular to the first direction;
the block having a plurality of layout objects arranged therein;
(b) identifying a first set of hypothetical cuts, substantially between layout object boundaries, that span the width of the block;
the first set of hypothetical cuts defining a set of sub-blocks with each sub-block having a width along the first direction and a length along the second direction;
(c) identifying a second set of hypothetical cuts, substantially between layout object boundaries, that span the length of sub-blocks in the set of sub-blocks;
(d) computing arrangement criteria of layout objects ordered according to the first and the second sets of hypothetical cuts;
(e) modifying cuts in the first and second sets of hypothetical cuts, using the computed arrangement criteria, to merge cuts that span two or more sub-blocks along the second direction;
(f) determining the logical order of the document using cuts between layout objects in the block remaining in the first and second sets of hypothetical cuts after performing (e).
7 Assignments
0 Petitions
Accused Products
Abstract
Methods are disclosed for recovering or determining logical structure of a document by assessing different combinations of vertical and horizontal cuts across a block of the document. The block is segmented using a scoring function that discards horizontal cuts in favor of vertical cuts shared among neighboring sub-blocks. The order in which the blocks and sub-blocks are segmented is then used to define the logical structure of the document, such as its reading order.
-
Citations
24 Claims
-
1. A method for determining a logical order of a document, comprising:
-
(a) assigning a page of the document to be a block having a width along a first direction and a length along a second direction perpendicular to the first direction;
the block having a plurality of layout objects arranged therein;
(b) identifying a first set of hypothetical cuts, substantially between layout object boundaries, that span the width of the block;
the first set of hypothetical cuts defining a set of sub-blocks with each sub-block having a width along the first direction and a length along the second direction;
(c) identifying a second set of hypothetical cuts, substantially between layout object boundaries, that span the length of sub-blocks in the set of sub-blocks;
(d) computing arrangement criteria of layout objects ordered according to the first and the second sets of hypothetical cuts;
(e) modifying cuts in the first and second sets of hypothetical cuts, using the computed arrangement criteria, to merge cuts that span two or more sub-blocks along the second direction;
(f) determining the logical order of the document using cuts between layout objects in the block remaining in the first and second sets of hypothetical cuts after performing (e). - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. An apparatus for determining a logical order of a document, comprising:
-
a memory for storing processing instructions of the apparatus; and
a processor coupled to the memory for executing the processing instructions of the apparatus;
the processor in executing the processing instructions;
(a) assigning a page of the document to be a block having a width along a first direction and a length along a second direction perpendicular to the first direction;
the block having a plurality of layout objects arranged therein;
(b) identifying a first set of hypothetical cuts, substantially between layout object boundaries, that span the width of the block;
the first set of hypothetical cuts defining a set of sub-blocks with each sub-block having a width along the first direction and a length along the second direction;
(c) identifying a second set of hypothetical cuts, substantially between layout object boundaries, that span the length of sub-blocks in the set of sub-blocks;
(d) computing arrangement criteria of layout objects ordered according to the first and the second sets of hypothetical cuts;
(e) modifying cuts in the first and second sets of hypothetical cuts, using the computed arrangement criteria, to merge cuts that span two or more sub-blocks along the second direction;
(f) determining the logical order of the document using cuts between layout objects in the block remaining in the first and second sets of hypothetical cuts after performing (e). - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
-
24. An article of manufacture for determining a logical order of a document, the article of manufacture comprising computer usable media including computer readable instructions embedded therein that causes a computer to perform a method, wherein the method comprises:
-
(a) assigning a page of the document to be a block having a width along a first direction and a length along a second direction perpendicular to the first direction;
the block having a plurality of layout objects arranged therein;
(b) identifying a first set of hypothetical cuts, substantially between layout object boundaries, that span the width of the block;
the first set of hypothetical cuts defining a set of sub-blocks with each sub-block having a width along the first direction and a length along the second direction;
(c) identifying a second set of hypothetical cuts, substantially between layout object boundaries, that span the length of sub-blocks in the set of sub-blocks;
(d) computing arrangement criteria of layout objects ordered according to the first and the second sets of hypothetical cuts;
(e) modifying cuts in the first and second sets of hypothetical cuts, using the computed arrangement criteria, to merge cuts that span two or more sub-blocks along the second direction;
(f) determining the logical order of the document using cuts between layout objects in the block remaining in the first and second sets of hypothetical cuts after performing (e).
-
Specification