Clustering data objects
First Claim
1. A method for unsupervised clustering data objects, comprising:
- calculating, with a processor, based on a relative depth in a semantic hierarchical tree of a dictionary, an importance value of at least one member in a first data object represented as a variable length vector of 0 to N members, said vector further comprising a subset of said members having an importance value above a designated importance threshold, wherein the data objects comprise sentences and said members comprise words, therein;
calculating, with said processor, based on a path distance in said semantic hierarchical tree of a dictionary, a member similarity value for each member of said subset of said members to at least a second data object;
when none of said subset of said members of said first data object are associated with at least one of a subset of members of said at least a second data object, in dependence upon a comparison of similarity values, dynamically form, with a clustering module, a first cluster comprising said first data object; and
when at least one of said subset of said members of said first data object is associated with at least one of a subset of members of said at least a second data object, in dependence upon a comparison of similarity values, dynamically form, with said clustering module, at least a second cluster comprising said first data object and said at least a second data object.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for clustering data objects includes a module for calculating an importance value of at least one member in a first data object represented as a variable length vector of 0 to N members and a clustering module for dynamically forming a plurality of clusters containing one or more data objects. The clustering module is configured to associate the first data object with at least one of the plurality of clusters in dependence upon the at least one member'"'"'s similarity value in comparison to members in other data objects. The clustering module may be configured to cluster the first data object into a plurality of clusters if it has at least two members and each member belongs to a different cluster.
23 Citations
7 Claims
-
1. A method for unsupervised clustering data objects, comprising:
-
calculating, with a processor, based on a relative depth in a semantic hierarchical tree of a dictionary, an importance value of at least one member in a first data object represented as a variable length vector of 0 to N members, said vector further comprising a subset of said members having an importance value above a designated importance threshold, wherein the data objects comprise sentences and said members comprise words, therein; calculating, with said processor, based on a path distance in said semantic hierarchical tree of a dictionary, a member similarity value for each member of said subset of said members to at least a second data object; when none of said subset of said members of said first data object are associated with at least one of a subset of members of said at least a second data object, in dependence upon a comparison of similarity values, dynamically form, with a clustering module, a first cluster comprising said first data object; and when at least one of said subset of said members of said first data object is associated with at least one of a subset of members of said at least a second data object, in dependence upon a comparison of similarity values, dynamically form, with said clustering module, at least a second cluster comprising said first data object and said at least a second data object. - View Dependent Claims (2, 7)
-
-
3. A computer program product for unsupervised clustering of data objects, the computer program product comprising:
-
a computer usable medium having computer usable program code embodied therewith, the computer usable program code comprising; computer usable program code configured to calculate, based on a relative depth in a semantic hierarchical tree of a dictionary, an importance value of at least one member in a first data object represented as a variable length vector of 0 to N members, said vector further comprising a subset of said members having an importance value above a designated importance threshold, wherein the data objects comprise sentences and said members comprise words, therein; computer usable program code configured to calculate, based on a path distance in said semantic hierarchical tree of a dictionary, a member similarity value for each member of said subset of said members to at least a second data object; when none of said subset of said members of said first data object are associated with at least one of a subset of members of said at least a second data object, in dependence upon a comparison of similarity values, computer usable program code configured to dynamically form, with a clustering module, a first cluster comprising said first data object; and when at least one of said subset of said members of said first data object is associated with at least one of a subset of members of said at least a second data object, in dependence upon a comparison of similarity values, computer usable program code configured to dynamically form, with said clustering module, at least a second cluster comprising said first data object and said at least a second data object. - View Dependent Claims (4)
-
-
5. A method for unsupervised clustering of data objects, comprising:
-
calculating, with a processor, based on a relative depth in a semantic hierarchical tree of a dictionary, an importance value of at least one member in a first data object represented as a variable length vector of 0 to N members, said vector further comprising a subset of said members having an importance value above a designated importance threshold, wherein the data objects comprise sentences of an electronic messaging system and said members comprise words, therein; calculating, with said processor, based on a path distance in said semantic hierarchical tree of a dictionary, a member similarity value for each member of said subset of said members to at least a second data object; when none of said subset of said members of said first data object are associated with at least one of a subset of members of said at least a second data object, in dependence upon a comparison of similarity values, dynamically form, with a clustering module, a first cluster comprising said first data object; and when at least one of said subset of said members of said first data object is associated with at least one of a subset of members of said at least a second data object, in dependence upon a comparison of similarity values, dynamically form, with said clustering module, at least a second cluster comprising said first data object and said at least a second data object. - View Dependent Claims (6)
-
Specification