Methods and computer-program products for organizing electronic documents
First Claim
1. A computer-based method of reclassifying and clustering electronic documents of a document corpus to improve classification of the electronic documents so that correct documents are returned as a result of a computer-based search, the method comprising:
- comparing, by a computer, each individual electronic document in the document corpus with each other electronic document in the document corpus, thereby forming document pairs, wherein the electronic documents of the document pairs are compared by;
calculating a similarity value with respect to the electronic documents of a document pair from a plurality of attributes of the electronic documents in the document corpus, the plurality of attributes comprising a citation attribute, a text-based attribute, and one or more of the following attributes;
an author attribute expressed as
1 Assignment
0 Petitions
Accused Products
Abstract
Methods of organizing documents by reclassification and clustering are disclosed. In one embodiment, a method of clustering electronic documents of a document corpus includes comparing, by a computer, each individual electronic document in the document corpus with each other electronic document in the document corpus, thereby forming document pairs. The electronic documents of the document pairs are compared by calculating a similarity value with respect to the electronic documents of a document pair, associating the similarity value with both electronic documents of the document pair, and applying a clustering algorithm to the document corpus using the similarity values to create a plurality of hierarchical clusters. The similarity value is based on a plurality of attributes of the electronic documents in the document corpus. The plurality of attributes includes a citation attribute, a text-based attribute and one or more of an author-attribute, a publication-attribute, an institution-attribute, a downloads-attribute, and a clustering-results-attribute.
18 Citations
16 Claims
-
1. A computer-based method of reclassifying and clustering electronic documents of a document corpus to improve classification of the electronic documents so that correct documents are returned as a result of a computer-based search, the method comprising:
comparing, by a computer, each individual electronic document in the document corpus with each other electronic document in the document corpus, thereby forming document pairs, wherein the electronic documents of the document pairs are compared by; calculating a similarity value with respect to the electronic documents of a document pair from a plurality of attributes of the electronic documents in the document corpus, the plurality of attributes comprising a citation attribute, a text-based attribute, and one or more of the following attributes; an author attribute expressed as - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
9. A method of improving computer-based search results by organizing electronic documents of a document corpus, each one of the electronic documents comprising a predetermined classification code and having similarity values based on a similarity with respect to other electronic documents in the document corpus, the method comprising:
comparing each individual electronic document in the document corpus with each other electronic document in the document corpus, thereby forming document pairs, wherein the electronic documents of the document pairs are compared by; calculating a similarity value with respect to the electronic documents of a document pair from a plurality of attributes of the electronic documents in the document corpus, the plurality of attributes comprising a citation attribute, a text-based attribute, and one or more of the following attributes; an author attribute expressed as - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
16. A computer-program product for clustering electronic documents of a document corpus to improve classification of the electronic documents so that correct documents are returned as a result of a computer-based search, the computer-program product comprising a non-transitory computer-readable medium storing executable instructions that, when executed by a computing device, cause the computing device to:
compare each individual electronic document in the document corpus with each other electronic document in the document corpus, thereby forming document pairs, wherein the electronic documents of the document pairs are compared by; calculating a similarity value with respect to the electronic documents of a document pair from a plurality of attributes of the electronic documents in the document corpus, the plurality of attributes comprising a citation attribute, a text-based attribute and one or more of the following attributes; an author attribute expressed as
Specification