CLASSIFYING DOCUMENTS BY CLUSTER
First Claim
1. A computer-implemented method, comprising:
- grouping, by a computing system, a corpus of documents into a plurality of disjoint clusters of documents based on one or more shared content attributes;
determining, by the computing system, a classification distribution associated with a first cluster of the plurality of clusters, the classification distribution associated with the first cluster being based on classifications assigned to individual documents of the first cluster; and
calculating, by the computing system, a classification distribution associated with a second cluster of the plurality of clusters based at least in part on the classification distribution associated with the first cluster and a relationship between the first and second clusters.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, apparatus, systems, and computer-readable media are provided for classifying, or “labeling,” documents such as emails en masse based on association with a cluster/template. In various implementations, a corpus of documents may be grouped into a plurality of disjoint clusters of documents based on one or more shared content attributes. A classification distribution associated with a first cluster of the plurality of clusters may be determined based on classifications assigned to individual documents of the first cluster. A classification distribution associated with a second cluster of the plurality of clusters may then be determined based at least in part on the classification distribution associated with the first cluster and a relationship between the first and second clusters.
55 Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
grouping, by a computing system, a corpus of documents into a plurality of disjoint clusters of documents based on one or more shared content attributes; determining, by the computing system, a classification distribution associated with a first cluster of the plurality of clusters, the classification distribution associated with the first cluster being based on classifications assigned to individual documents of the first cluster; and calculating, by the computing system, a classification distribution associated with a second cluster of the plurality of clusters based at least in part on the classification distribution associated with the first cluster and a relationship between the first and second clusters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A system including memory and one or more processors operable to execute instructions stored in the memory, comprising instructions to:
-
group a corpus of documents into a plurality of disjoint clusters of documents based on one or more shared content attributes; determine a classification distribution associated with a first cluster of the plurality of disjoint clusters, the classification distribution associated with the first cluster being based on classifications assigned to individual documents of the first cluster; calculate a classification distribution associated with a second cluster of the plurality of disjoint clusters based at least in part on the classification distribution associated with the first cluster and a relationship between the first and second clusters; and classify documents of the second cluster based on the classification distribution associated with the second cluster. - View Dependent Claims (18, 19)
-
-
20. At least one non-transitory computer-readable medium comprising instructions that, in response to execution of the instructions by a computing system, cause the computing system to perform the operations of:
-
grouping a corpus of documents into a plurality of disjoint clusters of documents based on one or more shared content attributes; determining a classification distribution associated with a first cluster of the plurality of disjoint clusters, the classification distribution associated with the first cluster being based on classifications assigned to individual documents of the first cluster; and calculating a classification distribution associated with a second cluster of the plurality of disjoint clusters based at least in part on the classification distribution associated with the first cluster and a relationship between the first and second clusters.
-
Specification