Computer-implemented system and method for visually suggesting classification for inclusion-based cluster spines
First Claim
1. A computer-implemented system for visually suggesting classification for inclusion-based document cluster spines, comprising:
- a non-transitory computer readable storage medium comprising program code; and
a computer processor configured coupled to the storage medium, wherein the processor is configured to execute the program code to perform steps to;
designate a set of reference documents each associated with a classification code;
obtain a different set of uncoded documents;
combine one or more of the coded reference documents with a plurality of uncoded documents into a combined document set;
group the documents in the combined document set into clusters;
organize the clusters along one or more spines, each spine comprising a vector;
provide a visual suggestion for assigning one of the classification codes to one of the spines comprising visually representing each of the reference concepts in the clusters along that spine;
identify one of the documents as a center of one of the clusters;
generate a score vector for the cluster center;
compare the score vector for the cluster center to score vectors associated with one or more of the reference documents;
identify a neighborhood of similar reference documents for the cluster based on the comparison; and
assign one of the classification codes to the cluster based on the neighborhood, comprising;
determine a distance between the cluster center and the reference documents in the neighborhood; and
generate the classification code for assignment to the cluster, comprising at least one of;
identify the reference document with the closest distance to the cluster center and assign the classification code of the reference document with the closest distance as the generated classification code for the cluster;
calculate an average of the distances between the cluster center and the reference documents associated with each of the classification codes and assign the classification code with the closest average distance as the generated classification code of the cluster; and
count the reference documents in the neighborhood for each of the classification codes, weigh each count based on the distance between the reference documents with the classification code and the cluster center, and assign the classification code with the highest weighted count as the generated classification code of the cluster.
5 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented system and method for visually suggesting classification for inclusion-based document cluster spines are provided. A set of reference documents each associated with a classification code is designated. A different set of un-coded documents is obtained. One or more of the coded reference documents are combined with a plurality of un-coded documents into a combined document set. The documents in the combined document set are grouped into clusters. The clusters are organized along one or more spines, each spine including a vector. A visual suggestion for assigning one of the classification codes to one of the spines is provided, including visually representing each of the reference concepts in the clusters along that spine.
309 Citations
16 Claims
-
1. A computer-implemented system for visually suggesting classification for inclusion-based document cluster spines, comprising:
-
a non-transitory computer readable storage medium comprising program code; and a computer processor configured coupled to the storage medium, wherein the processor is configured to execute the program code to perform steps to; designate a set of reference documents each associated with a classification code; obtain a different set of uncoded documents; combine one or more of the coded reference documents with a plurality of uncoded documents into a combined document set; group the documents in the combined document set into clusters; organize the clusters along one or more spines, each spine comprising a vector; provide a visual suggestion for assigning one of the classification codes to one of the spines comprising visually representing each of the reference concepts in the clusters along that spine; identify one of the documents as a center of one of the clusters; generate a score vector for the cluster center; compare the score vector for the cluster center to score vectors associated with one or more of the reference documents; identify a neighborhood of similar reference documents for the cluster based on the comparison; and assign one of the classification codes to the cluster based on the neighborhood, comprising; determine a distance between the cluster center and the reference documents in the neighborhood; and generate the classification code for assignment to the cluster, comprising at least one of; identify the reference document with the closest distance to the cluster center and assign the classification code of the reference document with the closest distance as the generated classification code for the cluster; calculate an average of the distances between the cluster center and the reference documents associated with each of the classification codes and assign the classification code with the closest average distance as the generated classification code of the cluster; and count the reference documents in the neighborhood for each of the classification codes, weigh each count based on the distance between the reference documents with the classification code and the cluster center, and assign the classification code with the highest weighted count as the generated classification code of the cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented method for visually suggesting classification for inclusion-based document cluster spines, comprising the steps of:
-
designating a set of reference documents each associated with a classification code; obtaining a different set of un-coded documents; combining one or more of the coded reference documents with a plurality of un-coded documents into a combined document set; grouping the documents in the combined document set into clusters; organizing the clusters along one or more spines, each spine comprising a vector; and providing a visual suggestion for assigning one of the classification codes to one of the spines comprising visually representing each of the reference concepts in the clusters along that spine; identifying one of the documents as a center of one of the clusters; generating a score vector for the cluster center; comparing the score vector for the cluster center to score vectors associated with one or more of the reference documents; identifying a neighborhood of similar reference documents for the cluster based on the comparison; and assigning one of the classification codes to the cluster based on the neighborhood, further comprising; determining a distance between the cluster center and the reference documents in the neighborhood; and generating the classification code for assignment to the cluster, comprising at least one of; identifying the reference document with the closest distance to the cluster center and assigning the classification code of the reference document with the closest distance as the generated classification code for the cluster; calculating an average of the distances between the cluster center and the reference documents associated with each of the classification codes and assigning the classification code with the closest average distance as the generated classification code of the cluster; and counting the reference documents in the neighborhood for each of the classification codes, weighing each count based on the distance between the reference documents with the classification code and the cluster center, and assigning the classification code with the highest weighted count as the generated classification code of the cluster, wherein the steps are performed by a suitably programmed computer. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification