Probabilistic data clustering
First Claim
1. A data mining system adapted to generate a cluster model from a data set comprising a plurality of objects, each object including a plurality of attributes, said attributes including a set of discrete ordinal attributes, said system including an iterative cluster definition means, the or each cluster having a distribution attribute associated with each of said set of discrete ordinal attributes, said cluster definition means including:
- means for determining, for each cluster, a conditional probability density (pj(x,z,q)) of an object lying in a cluster;
means for determining, for each cluster and for each object, a posterior probability (hij) of an object lying in a cluster, said posterior probability being a function of said conditional probability density of the cluster (pj(x,z,q)), a mixing fraction for said cluster (α
j) and an unconditional probability density (p(x,z,q)); and
means for determining, for each object attribute and for each cluster, a next cluster distribution attribute (μ
jk,Vjk;
ν
jk,Wjk;
π
jk,cjk), said distribution attribute being a function of said posterior probability, said object attribute value and a sum of said posterior probabilities;
wherein said means for determining the conditional probability density of an object lying in a cluster is characterised by means for determining the conditional probability density of an object having a discrete ordinal attribute value within a finite range of attribute values lying in a cluster, said conditional probability density for said discrete ordinal attribute being a function of an integral of a conditional probability function across a sub-range of said discrete ordinal attribute range of values, said sub-range comprising an upper bound and a lower bound bounding said discrete ordinal attribute value.
1 Assignment
0 Petitions
Accused Products
Abstract
A component of a data clusterer is used to determine a conditional probability density of an object (data point) lying in a cluster. The object has a discrete ordinal attribute value within a finite range of attribute values. The conditional probability density for the discrete ordinal attribute is a function of an integral of a conditional probability function across a sub-range of the discrete ordinal attribute range of values, the sub-range comprising an upper bound and a lower bound bounding the discrete ordinal attribute value.
52 Citations
6 Claims
-
1. A data mining system adapted to generate a cluster model from a data set comprising a plurality of objects, each object including a plurality of attributes, said attributes including a set of discrete ordinal attributes, said system including an iterative cluster definition means, the or each cluster having a distribution attribute associated with each of said set of discrete ordinal attributes, said cluster definition means including:
-
means for determining, for each cluster, a conditional probability density (pj(x,z,q)) of an object lying in a cluster;
means for determining, for each cluster and for each object, a posterior probability (hij) of an object lying in a cluster, said posterior probability being a function of said conditional probability density of the cluster (pj(x,z,q)), a mixing fraction for said cluster (α
j) and an unconditional probability density (p(x,z,q)); and
means for determining, for each object attribute and for each cluster, a next cluster distribution attribute (μ
jk,Vjk;
ν
jk,Wjk;
π
jk,cjk), said distribution attribute being a function of said posterior probability, said object attribute value and a sum of said posterior probabilities;
wherein said means for determining the conditional probability density of an object lying in a cluster is characterised by means for determining the conditional probability density of an object having a discrete ordinal attribute value within a finite range of attribute values lying in a cluster, said conditional probability density for said discrete ordinal attribute being a function of an integral of a conditional probability function across a sub-range of said discrete ordinal attribute range of values, said sub-range comprising an upper bound and a lower bound bounding said discrete ordinal attribute value. - View Dependent Claims (2, 3, 4, 5, 6)
-
Specification