Structuring document based on table of contents
First Claim
1. A method for structuring a document organized as a plurality of nodes:
- associated with a table of contents, the method comprising;
clustering the nodes into a plurality of clusters based on a similarity criterion;
identifying one of the clusters as corresponding to a highest or lowest level of the table of contents based on a selection criterion;
assigning the highest or lowest level to the nodes belonging to the identified cluster;
repeating the identifying and assigning to assign levels to the nodes belonging to each next highest or lowest level of the table of contents, the repeated identifying being based on the selection criteria applied disregarding nodes that have already been assigned a level; and
structuring the document based at least in part on the levels assigned to the table of contents nodes.
7 Assignments
0 Petitions
Accused Products
Abstract
A document is organized as a plurality of nodes associated with a table of contents. The nodes are clustered into a plurality of clusters based on a similarity criterion. One of the clusters is identified as corresponding to a highest or lowest level of the table of contents based on a selection criterion. The highest or lowest level is assigned to the nodes belonging to the identified cluster. The identifying and assigning are repeated to assign levels to the nodes belonging to each next highest or lowest level of the table of contents. The repeated identifying is based on the selection criteria applied disregarding nodes that have already been assigned a level. The document is structured based at least in part on the levels assigned to the table of contents nodes.
-
Citations
20 Claims
-
1. A method for structuring a document organized as a plurality of nodes:
-
associated with a table of contents, the method comprising;
clustering the nodes into a plurality of clusters based on a similarity criterion;
identifying one of the clusters as corresponding to a highest or lowest level of the table of contents based on a selection criterion;
assigning the highest or lowest level to the nodes belonging to the identified cluster;
repeating the identifying and assigning to assign levels to the nodes belonging to each next highest or lowest level of the table of contents, the repeated identifying being based on the selection criteria applied disregarding nodes that have already been assigned a level; and
structuring the document based at least in part on the levels assigned to the table of contents nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for reconstructing a table of contents hierarchy, the method comprising:
-
clustering nodes associated with a table of contents into a plurality of clusters based on a similarity criterion;
assigning the nodes of each cluster to a level of the table of contents corresponding to that cluster;
validating the assigned levels of the nodes; and
taking a remedial action responsive to a validation failure. - View Dependent Claims (9, 10, 11, 12)
-
-
13. An apparatus for structuring a document, the apparatus comprising:
-
a nodes clustering module for clustering nodes of the document associated with a table of contents into a plurality of clusters based on a similarity criterion;
a terminal level cluster identifier identifying one of the clusters as corresponding to a highest or lowest level of the table of contents based on a selection criterion;
an assignor assigning the highest or lowest level to the nodes belonging to the identified cluster; and
an iterator for iteratively applying the terminal level cluster identifier and the assignor to assign levels to the nodes belonging to each next highest or lowest level of the table of contents, the iterating being based on the selection criteria applied disregarding nodes that have already been assigned a level. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification