Computer-implemented system and method for generating a reference set via clustering
First Claim
Patent Images
1. A computer-implemented method for generating a reference set via clustering, comprising the steps of:
- obtaining a collection of unclassified documents;
grouping the unclassified documents into clusters;
selecting n-documents from each cluster, comprising;
building a hierarchical tree of the clusters; and
traversing the hierarchical tree to identify the n-documents, wherein one of the n-documents from each cluster is located closest to a center of that cluster;
combining the selected n-documents as reference set candidatesassigning a classification code to each of the reference set candidates; and
grouping two or more of the reference set candidates as a reference set of classified documents,wherein the steps are performed by a suitably programmed computer.
4 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented system and method for generating a reference set via clustering is provided. A collection of unclassified documents is obtained and grouped into clusters. N-documents are selected from each cluster and are combined as reference set candidates. One of the n-documents from each cluster is located closest to a center of that cluster. A classification code is assigned to each of the reference set candidates. Two or more of the reference set candidates are grouped as a reference set of classified documents.
299 Citations
20 Claims
-
1. A computer-implemented method for generating a reference set via clustering, comprising the steps of:
-
obtaining a collection of unclassified documents; grouping the unclassified documents into clusters; selecting n-documents from each cluster, comprising; building a hierarchical tree of the clusters; and traversing the hierarchical tree to identify the n-documents, wherein one of the n-documents from each cluster is located closest to a center of that cluster; combining the selected n-documents as reference set candidates assigning a classification code to each of the reference set candidates; and grouping two or more of the reference set candidates as a reference set of classified documents, wherein the steps are performed by a suitably programmed computer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented system for generating a reference set via clustering, comprising:
-
a collection module to obtain a collection of unclassified documents; a clustering module to group the unclassified documents into clusters; a candidate selection module to select n-documents from each cluster, comprising; a tree module to build a hierarchical tree of the clusters; and a traversal module to traverse the hierarchical tree to identify the n-documents, wherein one of the n-documents from each cluster is located closest to a center of that cluster; a grouping module to combine the selected n-documents as reference set candidates; a classification module to assign a classification code to each of the reference set candidates; a reference set module to group two or more of the reference set candidates as a reference set of classified documents; and a processor to execute the modules. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer-implemented method for generating a reference set via clustering, comprising the steps of:
-
obtaining a collection of unclassified documents; grouping the unclassified documents into clusters; selecting n-documents from each cluster and combining the selected n-documents as reference set candidates, wherein one of the n-documents from each cluster is located closest to a center of that cluster; assigning a classification code to each of the reference set candidates; and grouping two or more of the reference set candidates as a reference set of classified documents, comprising; applying a size threshold to the reference set candidates; and clustering the reference set candidates until the size threshold is satisfied, wherein the steps are performed by a suitably programmed computer. - View Dependent Claims (20)
-
Specification