Structuring document based on table of contents
First Claim
1. A method for reconstructing a table of contents hierarchy, the method comprising:
- clustering nodes associated with a table of contents into a plurality of flat clusters that are not arranged hierarchically based on a similarity criterion;
assigning the nodes of each flat cluster to a level of the table of contents corresponding to that flat cluster;
validating the assigned levels of the nodes; and
taking a remedial action responsive to a validation failure;
wherein at least the clustering, assigning, and validating are performed by a processor.
7 Assignments
0 Petitions
Accused Products
Abstract
A document is organized as a plurality of nodes associated with a table of contents. The nodes are clustered into a plurality of clusters based on a similarity criterion. One of the clusters is identified as corresponding to a highest or lowest level of the table of contents based on a selection criterion. The highest or lowest level is assigned to the nodes belonging to the identified cluster. The identifying and assigning are repeated to assign levels to the nodes belonging to each next highest or lowest level of the table of contents. The repeated identifying is based on the selection criteria applied disregarding nodes that have already been assigned a level. The document is structured based at least in part on the levels assigned to the table of contents nodes.
-
Citations
19 Claims
-
1. A method for reconstructing a table of contents hierarchy, the method comprising:
-
clustering nodes associated with a table of contents into a plurality of flat clusters that are not arranged hierarchically based on a similarity criterion; assigning the nodes of each flat cluster to a level of the table of contents corresponding to that flat cluster; validating the assigned levels of the nodes; and taking a remedial action responsive to a validation failure; wherein at least the clustering, assigning, and validating are performed by a processor. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for structuring a document, the method comprising:
-
clustering nodes of the document associated with a table of contents into a plurality of clusters based on a similarity criterion, the plurality of clusters not being arranged hierarchically; identifying one of the clusters as corresponding to a highest or lowest level of the table of contents based on a selection criterion; assigning the highest or lowest level to the nodes belonging to the identified cluster; iteratively applying the identifying and assigning to assign levels to the nodes belonging to each next highest or lowest level of the table of contents, the iterating being based on the selection criteria applied disregarding nodes that have already been assigned a level; and structuring the document based at least in part on the levels assigned to the table of contents nodes; wherein the clustering, identifying, assigning, iterating, and structuring are performed by an apparatus including a processor. - View Dependent Claims (7, 8, 9, 10, 11, 12)
-
-
13. A method for structuring a document organized as a plurality of nodes associated with a table of contents, the method comprising:
-
clustering the nodes into a plurality of clusters based on a similarity criterion, the clustering producing flat clusters that are not arranged hierarchically; identifying one of the clusters as corresponding to a highest or lowest level of the table of contents based on a selection criterion; assigning the highest or lowest level to the nodes belonging to the identified cluster; repeating the identifying and assigning to assign levels to the nodes belonging to each next highest or lowest level of the table of contents, the repeated identifying being based on the selection criteria applied disregarding nodes that have already been assigned a level; and structuring the document based at least in part on the levels assigned to the table of contents nodes; wherein the clustering, identifying, assigning, repeating, and structuring are performed by a processor. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
Specification