Method for organizing large numbers of documents
First Claim
1. A non-transitory computer product including a data structure for organizing of a plurality of documents, and capable of being utilized by a processor for manipulating data of said data structure and for displaying selected data on a display unit, said data structure comprising:
- a plurality of directionally interlinked nodes, each node being directly linked to one or more documents, at least one of the nodes being directly linked to more than one document, each document having at least a header and a body text, each document having a fingerprint, all documents that are directly linked to a same node having a same fingerprint and identical normalized body text, and all documents that have a same fingerprint and identical normalized body text being directly linked to the same node;
wherein for any first node of said nodes that is a descendent of a second node of said nodes, the normalized body text of each document that is directly linked to the first node is inclusive of the normalized body text of each document that is directly linked to the second node,wherein each fingerprint comprises a representation of the normalized body text of a corresponding document,wherein the normalized body text is the body text in which at least body text formatting and added characters are removed, andwherein said plurality of nodes are arranged in terms of more than one tree, each tree comprising at least one node from said plurality of directionally interlinked nodes, each tree comprising at least a root node and at least a leaf node, a root node being a node that is not a descendant of any other node, and a leaf node being a node that has no descendent nodes, a node not being prohibited from being both a root node and a leaf node, all nodes that are descendant from said root node are contained by said tree.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer product including a data structure for organizing of a plurality of documents, and capable of being utilized by a processor for manipulating data of the data structure and capable of displaying selected data on a display unit. The data structure includes a plurality of directionally interlinked nodes, each node being associated with one or more documents having a header and body text. All the documents are associated with a given node and have identical normalized body text. All documents that have identical normalized body text are associated with the same node. One or more of the nodes is associated with more than one document. For any node that is a descendent of another node, the normalized body text of each document associated with the node is inclusive of the normalized body text of a document that is associated with the other node.
48 Citations
18 Claims
-
1. A non-transitory computer product including a data structure for organizing of a plurality of documents, and capable of being utilized by a processor for manipulating data of said data structure and for displaying selected data on a display unit, said data structure comprising:
-
a plurality of directionally interlinked nodes, each node being directly linked to one or more documents, at least one of the nodes being directly linked to more than one document, each document having at least a header and a body text, each document having a fingerprint, all documents that are directly linked to a same node having a same fingerprint and identical normalized body text, and all documents that have a same fingerprint and identical normalized body text being directly linked to the same node; wherein for any first node of said nodes that is a descendent of a second node of said nodes, the normalized body text of each document that is directly linked to the first node is inclusive of the normalized body text of each document that is directly linked to the second node, wherein each fingerprint comprises a representation of the normalized body text of a corresponding document, wherein the normalized body text is the body text in which at least body text formatting and added characters are removed, and wherein said plurality of nodes are arranged in terms of more than one tree, each tree comprising at least one node from said plurality of directionally interlinked nodes, each tree comprising at least a root node and at least a leaf node, a root node being a node that is not a descendant of any other node, and a leaf node being a node that has no descendent nodes, a node not being prohibited from being both a root node and a leaf node, all nodes that are descendant from said root node are contained by said tree. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system comprising:
-
a processor and associated display communicating with a data structure, the data structure comprising; a plurality of directionally interlinked nodes, each node being directly linked to one or more documents, at least one of the nodes being directly linked to more than one document, each document having at least a header and a body text, each document having a fingerprint, all documents that are directly linked to a same node having a same fingerprint and identical normalized body text, and all documents that have a same fingerprint and identical normalized body text being directly linked to the same node; wherein for any first node of said nodes that is a descendent of a second node of said nodes, the normalized body text of each document that is directly linked to the first node is inclusive of the normalized body text of each document that is directly linked to the second node, wherein each fingerprint comprises a representation of the normalized body text of a corresponding document, wherein the normalized body text is the body text in which at least body text formatting and added characters are removed, wherein said plurality of nodes are arranged in terms of more than one tree, each tree comprising at least one node from said plurality of directionally interlinked nodes, each tree comprising at least a root node and at least a leaf node, a root node being a node that is not a descendant of any other node, and a leaf node being a node that has no descendent nodes, a node not being prohibited from being both a root node and a leaf node, all nodes that are descendant from said root node are contained by said tree, wherein said plurality of nodes are arranged in terms of at least a first tree and a second tree that contain a link to one another, and wherein said link is indicative of the fact that said first tree contains a node that is directly linked to a document that is a near-duplicate of a document that is directly linked to a node in said second tree, the processor and associated display configured to manipulate data of said data structure and display selected data on a display unit, wherein said processor is further configured to display said first tree and a node from said second tree in close proximity on said display unit. - View Dependent Claims (13)
-
-
14. A system comprising:
-
a processor and associated display communicating with a data structure, the data structure comprising; a plurality of directionally interlinked nodes, each node being directly linked to one or more documents, at least one of the nodes being directly linked to more than one document, each document having at least a header and a body text, each document having a fingerprint, all documents that are directly linked to a same node having a same fingerprint and identical normalized body text, and all documents that have a same fingerprint and identical normalized body text being directly linked to the same node; wherein for any first node of said nodes that is a descendent of a second node of said nodes, the normalized body text of each document that is directly linked to the first node is inclusive of the normalized body text of each document that is directly linked to the second node, wherein each fingerprint comprises a representation of the normalized body text of a corresponding document, wherein the normalized body text is the body text in which at least body text formatting and added characters are removed, and wherein said plurality of nodes are arranged in terms of more than one tree, each tree comprising at least one node from said plurality of directionally interlinked nodes, each tree comprising at least a root node and at least a leaf node, a root node being a node that is not a descendant of any other node, and a leaf node being a node that has no descendent nodes, a node not being prohibited from being both a root node and a leaf node, all nodes that are descendant from said root node are contained by said tree, the processor and associated display configured to manipulate said data structure and display selected data on a display unit, wherein said processor is further configured to compare text of two documents, each document having a fingerprint, said documents being directly linked to different nodes.
-
-
15. A system comprising:
-
a processor and associated display communicating with a data structure, the data structure comprising; a plurality of directionally interlinked nodes, each node being directly linked to one or more documents, at least one of the nodes being directly linked to more than one document, each document having at least a header and a body text, each document having a fingerprint, all documents that are directly linked to a same node having a same fingerprint and identical normalized body text, and all documents that have a same fingerprint and identical normalized body text being directly linked to the same node; wherein for any first node of said nodes that is a descendent of a second node of said nodes, the normalized body text of each document that is directly linked to the first node is inclusive of the normalized body text of each document that is directly linked to the second node, wherein each fingerprint comprises a representation of the normalized body text of a corresponding document, wherein the normalized body text is the body text in which at least body text formatting and added characters are removed, and wherein said plurality of nodes are arranged in terms of more than one tree, each tree comprising at least one node from said plurality of directionally interlinked nodes, each tree comprising at least a root node and at least a leaf node, a root node being a node that is not a descendant of any other node, and a leaf node being a node that has no descendent nodes, a node not being prohibited from being both a root node and a leaf node, all nodes that are descendant from said root node are contained by said tree, the processor and associated display configured to manipulate said data structure and display selected data on a display unit, wherein said processor is further configured to display the header and the body text of a document, said document having a fingerprint and being directly linked to one of the nodes. - View Dependent Claims (16, 17)
-
-
18. A system comprising:
-
a processor and associated display communicating with a data structure, the data structure comprising; a plurality of directionally interlinked nodes, each node being directly linked to one or more documents, at least one of the nodes being directly linked to more than one document, each document having at least a header and a body text, each document having a fingerprint, all documents that are directly linked to a same node having a same fingerprint and identical normalized body text, and all documents that have a same fingerprint and identical normalized body text being directly linked to the same node; wherein for any first node of said nodes that is a descendent of a second node of said nodes, the normalized body text of each document that is directly linked to the first node is inclusive of the normalized body text of each document that is directly linked to the second node, wherein each fingerprint comprises a representation of the normalized body text of a corresponding document, wherein the normalized body text is the body text in which at least body text formatting and added characters are removed, and wherein said plurality of nodes are arranged in terms of more than one tree, each tree comprising at least one node from said plurality of directionally interlinked nodes, each tree comprising at least a root node and at least a leaf node, a root node being a node that is not a descendant of any other node, and a leaf node being a node that has no descendent nodes, a node not being prohibited from being both a root node and a leaf node, all nodes that are descendant from said root node are contained by said tree, the processor and associated display configured to manipulate said data structure and display selected data on a display unit, wherein said processor is further configured to display a list of documents that are directly linked to leaf nodes, each of the documents having a respective fingerprint, wherein a leaf node comprises a node that has no descendant nodes.
-
Specification