CLUSTERING COMMUNICATIONS BASED ON CLASSIFICATION
First Claim
1. A computer implemented method, comprising:
- identifying a plurality of classification terms indicative of a classification;
identifying a corpus of communications from one or more databases, the corpus of communications including a plurality of communications that are not labeled with an association to the classification;
determining a cluster of the communications based on occurrence of one or more of the classification terms in the communications of the cluster;
determining a feature set based on the communications of the cluster, the feature set including one or more features that are based on content of the communications that is in addition to the classification terms; and
assigning the feature set to an indication of the classification.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus related to clustering documents based on one or more classification terms and optionally based on similarity of structural paths of the documents. In some implementations, the documents are communications such as structured emails or other structured communications. In some of those implementations, clustering the communications includes identifying a plurality of classification terms indicative of a classification, identifying a corpus of communications that includes communications that are not labeled with an association to the classification, and determining a cluster of the communications based on occurrence of one or more of the classification terms in the communications of the cluster.
31 Citations
26 Claims
-
1. A computer implemented method, comprising:
-
identifying a plurality of classification terms indicative of a classification; identifying a corpus of communications from one or more databases, the corpus of communications including a plurality of communications that are not labeled with an association to the classification; determining a cluster of the communications based on occurrence of one or more of the classification terms in the communications of the cluster; determining a feature set based on the communications of the cluster, the feature set including one or more features that are based on content of the communications that is in addition to the classification terms; and assigning the feature set to an indication of the classification. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system including memory and one or more processors operable to execute instructions stored in the memory, comprising instructions to:
-
identify a plurality of classification terms indicative of a classification; identify a corpus of communications from one or more databases, the corpus of communications including a plurality of communications that are not labeled with an association to the classification; determine a cluster of the communications based on occurrence of one or more of the classification terms in the communications of the cluster; determine a feature set based on the communications of the cluster, the feature set including one or more features that are based on content of the communications that is in addition to the classification terms; and assign the feature set to an indication of the classification. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A non-transitory computer-readable storage medium comprising instructions that, in response to execution of the instructions by a computing system, cause the computing system to perform operations comprising:
-
identifying a plurality of classification terms indicative of a classification; identifying a corpus of communications from one or more databases, the corpus of communications including a plurality of communications that are not labeled with an association to the classification; determining a cluster of the communications based on occurrence of one or more of the classification terms in the communications of the cluster; determining a feature set based on the communications of the cluster, the feature set including one or more features that are based on content of the communications that is in addition to the classification terms; and assigning the feature set to an indication of the classification.
-
Specification