System and method for probabilistic relational clustering
First Claim
1. A method of clustering a set of objects having respective object types, respective object attributes, homogeneous relationships between respective objects of the same object type, and heterogeneous relationships between objects having a different object types, the method comprising:
- iteratively optimizing a clustering of the set of objects within a plurality of latent classes, dependent on object type, object attributes, homogeneous relationships, and heterogeneous relationships, by performing;
in an expectation step, updating a set of posteriors to maximize a probability that an object is associated with a respective latent class comprising, for each object, individually fixing an assigned latent class for all other objects, and maximizing an objective function for the respective object, comprising minimizing a computed distance between an observation of the object attributes, homogeneous relationships, and heterogeneous relationships of a respective object and parameters of a corresponding expectation that the object is associated with the respective latent class, and repeating until no object changes in assigned latent class between successive repetition, andin a minimization step, updating the plurality of latent classes based on the updated set of posteriors; and
storing the optimized clustering.
1 Assignment
0 Petitions
Accused Products
Abstract
Relational clustering has attracted more and more attention due to its phenomenal impact in various important applications which involve multi-type interrelated data objects, such as Web mining, search marketing, bioinformatics, citation analysis, and epidemiology. A probabilistic model is presented for relational clustering, which also provides a principal framework to unify various important clustering tasks including traditional attributes-based clustering, semi-supervised clustering, co-clustering and graph clustering. The model seeks to identify cluster structures for each type of data objects and interaction patterns between different types of objects. Under this model, parametric hard and soft relational clustering algorithms are provided under a large number of exponential family distributions. The algorithms are applicable to relational data of various structures and at the same time unify a number of state-of-the-art clustering algorithms: co-clustering algorithms, the k-partite graph clustering, and semi-supervised clustering based on hidden Markov random fields.
339 Citations
20 Claims
-
1. A method of clustering a set of objects having respective object types, respective object attributes, homogeneous relationships between respective objects of the same object type, and heterogeneous relationships between objects having a different object types, the method comprising:
-
iteratively optimizing a clustering of the set of objects within a plurality of latent classes, dependent on object type, object attributes, homogeneous relationships, and heterogeneous relationships, by performing; in an expectation step, updating a set of posteriors to maximize a probability that an object is associated with a respective latent class comprising, for each object, individually fixing an assigned latent class for all other objects, and maximizing an objective function for the respective object, comprising minimizing a computed distance between an observation of the object attributes, homogeneous relationships, and heterogeneous relationships of a respective object and parameters of a corresponding expectation that the object is associated with the respective latent class, and repeating until no object changes in assigned latent class between successive repetition, and in a minimization step, updating the plurality of latent classes based on the updated set of posteriors; and storing the optimized clustering. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An system for clustering a set of objects having object types, object attributes, homogeneous relationships between objects of the same object type, and heterogeneous relationships between objects having different object types, the system comprising:
-
a programmable processor configured to; iteratively optimize a clustering of the set of objects within a plurality of latent classes, dependent on object types, object attributes, homogeneous relationships, and heterogeneous relationships, by performing, in an expectation step, wherein the programmable processor is configured to; update a set of posteriors to maximize a probability that an object is associated with a respective latent class, comprising, for each object, a substep to individually fix an assigned cluster for all other objects, and maximize an objective function for the respective object, comprising a substep to minimize a computational distance between an observation of the object attributes, homogeneous relationships, and heterogeneous relationships of a respective object and parameters of a corresponding expectation that the object is associated with the respective latent class, and repeat until no object changes in assigned cluster between successive repetition, and in a minimization step, updating the plurality of latent classes based on the updated set of posteriors; and store the optimized clustering in a memory; and a communications port configured to communicate at least one of an object and clustering-related information. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A method of clustering a plurality of objects, having object types, object attributes, homogeneous relationships between objects of the same object type, and heterogeneous relationships between objects having different object types which do not comply with an independent and identically distributed (IID) statistical presumption, the method comprising:
-
optimizing an object clustering of the plurality of objects in a plurality of latent object classes based on at least the object types, object attributes, homogeneous relationships, and heterogeneous relationships, by iteratively; updating a set of posteriors to maximize an expectation probability that an object is associated with a respective latent object class comprising, iteratively maximizing an objective function for each respective object while an assigned latent object class for all other objects is fixed, until no object changes in assigned latent object class occurs, and maximizing an objective function for the respective object, comprising minimizing a computed distance between an observation of the object attributes, homogeneous relationships, and heterogeneous relationships of a respective object and parameters of the corresponding expectation probability, and repeating until no object changes in assigned latent object class occurs, and updating the plurality of latent object classes based on the updated set of posteriors; and at least one of;
communicating a latent object class associated with an object, communicating a set of objects within a latent object class, and responding to a query based on the optimized object clustering. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification