Method for determining a quality for a data clustering and data processing system
First Claim
1. A method for determining a quality for a data clustering, said data clustering resulting in a plurality of clusters each cluster having a cluster identifier, the method comprising the steps of:
- determining a set of observed values for at least one of the clusters by mapping the cluster identifier of said one of the clusters to a first predefined value and by mapping the cluster identifiers of other clusters to a second predefined value, and calculating a normalized statistical coefficient based on the set of observed values to determine the quality for said one of the clusters.
1 Assignment
0 Petitions
Accused Products
Abstract
This invention relates to a method for determining a quality for a data clustering, said data clustering resulting in a plurality of clusters each cluster having a cluster identifier, the method comprising the steps of:
determining a set of observed values for at least one of the clusters by mapping the cluster identifier of said one of the clusters to a first predefined value and by mapping the cluster identifiers of other clusters to a second predefined value, and
calculating a normalized statistical coefficient based on the set of observed values to determine the quality for said one of the clusters.
-
Citations
14 Claims
-
1. A method for determining a quality for a data clustering, said data clustering resulting in a plurality of clusters each cluster having a cluster identifier, the method comprising the steps of:
-
determining a set of observed values for at least one of the clusters by mapping the cluster identifier of said one of the clusters to a first predefined value and by mapping the cluster identifiers of other clusters to a second predefined value, and calculating a normalized statistical coefficient based on the set of observed values to determine the quality for said one of the clusters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 13, 14)
performing a first data clustering by means of a first data clustering method, determining the quality for the first data clustering by means of a method in accordance with any one of the preceding claims 1 to 5, selecting at least one cluster with a relatively low normalized statistical coefficient, performing a second data clustering by means of the first data clustering method or by means of a second data clustering method with respect to the selected cluster, and determining the quality of the second data clustering with respect to the selected cluster.
-
-
7. The method of claim 6, whereby the steps of selecting of at least one of the clusters, applying the first or the second data clustering method and determining the quality with respect to the selected cluster are performed iteratively.
-
13. A computer-readable storage medium tangibly embodying a program of computer instructions for performing a method in accordance with any one of the preceding claims 1 to 5.
-
14. A computer-readable storage medium tangibly embodying a program of computer instructions for performing a method in accordance with claim 6.
-
8. A data processing system comprising:
-
means (8) for storing a number of records, means (9, 10) for performing a data clustering of the records into a plurality of clusters each having a cluster identifier, means (11) for determining a set of observed values for each of the clusters by mapping the cluster identifier of a given cluster to a first predefined value and by mapping the cluster identifiers of other clusters to a second predefined value, and means (11) for calculating a normalized statistical coefficient based on the set of observed values. - View Dependent Claims (9, 10, 11, 12)
-
Specification