Method for Organizing Large Numbers of Documents
First Claim
1. ) A computer product including a data structure for organizing of a plurality of documents, and capable of being utilized by a processor for manipulating data of said data structure and capable of displaying selected data on a display unit;
- said data structure comprising;
a plurality of directionally interlinked nodes, each node being associated with at least one document having at least a header and body text; and
wherein all documents associated with a given node having substantially identical normalized body text, and wherein all documents having substantially identical normalized body text being associated with the same node, and wherein at least one node being associated with more than one document;
for any first node of said nodes that is a descendent of a second node of said nodes, the normalized body text of each document associated with said first node is substantially inclusive of the normalized body text of each document that is associated with said second node.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer product including a data structure for organizing of a plurality of documents, and capable of being utilized by a processor for manipulating data of the data structure and capable of displaying selected data on a display unit. The data structure includes a plurality of directionally interlinked nodes, each node being associated with one or more documents having a header and body text. All the documents are associated with a given node and have identical normalized body text. All documents that have identical normalized body text are associated with the same node. One or more of the nodes is associated with more than one document. For any node that is a descendent of another node, the normalized body text of each document associated with the node is inclusive of the normalized body text of a document that is associated with the other node.
-
Citations
59 Claims
-
1. ) A computer product including a data structure for organizing of a plurality of documents, and capable of being utilized by a processor for manipulating data of said data structure and capable of displaying selected data on a display unit;
- said data structure comprising;
a plurality of directionally interlinked nodes, each node being associated with at least one document having at least a header and body text; and
wherein all documents associated with a given node having substantially identical normalized body text, and wherein all documents having substantially identical normalized body text being associated with the same node, and wherein at least one node being associated with more than one document;for any first node of said nodes that is a descendent of a second node of said nodes, the normalized body text of each document associated with said first node is substantially inclusive of the normalized body text of each document that is associated with said second node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- said data structure comprising;
-
24. ) A method for organizing documents into nodes, in which a node represents a group of substantially equivalent documents, said method comprising:
-
(i) providing a plurality of original documents, each comprising a header and a body, and wherein said header comprises at least one parameter and wherein said body comprises text, (ii) selecting a document from among said documents and associating the document with a node, comparing at least a portion of the body text of said document to at least a portion of the body texts of other documents from amongst said plurality of documents, and in the case of a match, merging the node associated with said document with a node associated with the matching document, (iii) searching the body of said document to locate a first instance of header-type text, wherein said header-type text contains at least one header parameter; (iv) constructing a presumed document comprising a header and a body, wherein said header of said presumed document comprises one or more parameters from said header-type text located within said body of said original document, and wherein said body of said presumed document substantially comprises the text located after said header-type text in said body of said original document, and associating said presumed document with a node; (v) comparing at least a portion of the body text of the presumed document to at least a portion of the body texts of at least one other document from among said plurality of documents and in the case of a match, merging a node associated with said presumed document with a node associated with the matching document, (vi) if the comparison of (v) does not find a match, processing repeatedly the remainder of the body of said document for successive instances of header-type text, as stipulated in stages (iii)-(v), and for each instance, constructing a presumed document, comparing for any matching documents to the presumed document, and if found, merging the nodes associated with the matching documents, until no new instances of header-type text are found. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
-
-
39. ) A method for reducing duplicate document display of a large number of documents, said method comprising:
-
a) comparing a fingerprint of a document with previously stored document fingerprints, wherein a fingerprint is formed for each of at least a portion of the normalized body text and a normalized subject parameter of a document, wherein said comparison for detecting and indicating duplicating documents; b) searching the document for instances of header-type text, searching in text order through the normalized body text of the document, and if header-type text is found in said search, i) deriving a presumed document comprising a header and a body text, by treating parameters from the instance of header-type text in the document as parameters of a header for the presumed document, and by treating all ensuing body text of the normalized body text of the document as the body text of the presumed document, and applying step a) to the presumed documents, and ii) if the fingerprint of the presumed document is unique, continuing to search the normalized body text of the document from which the presumed document is derived for further instances of header-type text, searching in text order through the normalized body text of the document, and if a further instance of header-type text is found in said search, applying step i) to derive and process an additional presumed document, and ii) repeating step ii) until no more instances of header-type text are found. - View Dependent Claims (40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55)
-
-
56. ) A computer product including a data structure for organizing of a plurality of documents and capable of being utilized by a processor for manipulating data of said data structure and capable of displaying selected data on a display unit;
- said data structure comprising;
one or more trees, wherein a tree comprises at least a trunk and at least one node, wherein said at least one node being associated with a document having at least a header and body text, and wherein a trunk being associated with zero or more documents having at least a header and a body text and wherein all documents whose body text includes the same included document are associated with the same tree, and wherein a unique inclusive document, as well as documents that duplicate to said unique inclusive document, are associated with one of one or more unique nodes of said tree, and wherein an included document, as well as documents that duplicate to said included document, are associated with said trunk of said tree. - View Dependent Claims (57, 58, 59)
- said data structure comprising;
Specification