MERGING SEMANTICALLY SIMILAR CLUSTERS BASED ON CLUSTER LABELS
2 Assignments
0 Petitions
Accused Products
Abstract
A server device may receive first label information regarding a first cluster that includes information identifying a first set of documents, where the first label information regarding the first cluster includes a first set of labels that are associated with the first cluster, and second label information regarding a second cluster that includes information identifying a second set of documents, where the second label information regarding the second cluster includes a second set of labels that are associated with the second cluster, where the second set of documents is different from the first set of documents. The server device may determine that the first and second clusters are semantically similar, which may include determining whether a similarity of the first and second clusters is above a similarity threshold. The server device may also form a merged cluster by merging the first and second clusters. The server device may further determine one or more labels for the merged cluster. Furthermore, the server device may assign the one or more labels to the merged cluster.
-
Citations
44 Claims
-
1-24. -24. (canceled)
-
25. A method comprising:
-
forming, by one or more devices, a merged cluster of documents based on a first cluster of documents and a second cluster of documents; identifying, by the one or more devices, one or more labels that are associated with at least one of the first cluster of documents or the second cluster of documents; identifying, by the one or more devices, a first confidence score, associated with a particular label of the one or more labels, with respect to the first cluster of documents; identifying, by the one or more devices, a second confidence score, associated with the particular label, with respect to the second cluster of documents; determining, by the one or more devices, an overall confidence score based on the first confidence score and the second confidence score; and selectively assigning, by the one or more devices, the particular label to the merged cluster based on the overall confidence score. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. A system comprising:
one or more processors to; identify a label associated with at least one of a first cluster or a second cluster; identify a first confidence score, associated with the label, with respect to the first cluster; identify a second confidence score, associated with the label, with respect to the second cluster; determine an overall confidence score based on the first confidence score and the second confidence score; and selectively assign the label to a merged cluster based on the overall confidence score, the merged cluster including at least a group of the first cluster or the second cluster. - View Dependent Claims (35, 36, 37, 38, 39, 40, 41)
-
42. A non-transitory computer-readable medium storing instructions, the instructions comprising:
one or more instructions that, when executed by at least one processor, cause the at least one processor to; determine a first vector for a first set of labels that are associated with a first cluster of documents; determine a second vector for a second set of labels that are associated with a second cluster of documents; determine a measure of similarity of the first vector and the second vector; determine that the measure of similarity satisfies a particular threshold; determine that the first cluster of documents and the second cluster of documents are semantically similar based on determining that the measure of similarity satisfies the particular threshold; and form a merged cluster based on determining that the first cluster of documents and the second cluster of documents are semantically similar. - View Dependent Claims (43, 44)
Specification