System and method for determining internal parameters of a data clustering program
First Claim
1. A method for determining an internal parameter of a data clustering program for clustering data records, comprising:
- inputting user data indicative of a similarity of pairs of data records;
calculating similarity values for the pairs of data records based on a default value of the internal parameter; and
determining a similarity threshold for the similarity values corresponding to the user data.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and associated method for tuning a data clustering program to a clustering task, determine at least one internal parameter of a data clustering program. The determination of one or more of the internal parameters of the data clustering program occurs before the clustering begins. Consequently, clustering does not need to be performed iteratively, thus improving clustering program performance in terms of the required processing time and processing resources. The system provides pairs of data records; the user indicates whether or not these data records should belong to the same cluster. The similarity values of the records of the selected pairs are calculated based on the default parameters of the clustering program. From the resulting similarity values, an optimal similarity threshold is determined. When the optimization criterion does not yield a single optimal similarity threshold range, equivalent candidate ranges are selected. To select one of the candidate ranges, pairs of data records having a calculated similarity value within the critical region are offered to the user.
49 Citations
30 Claims
-
1. A method for determining an internal parameter of a data clustering program for clustering data records, comprising:
-
inputting user data indicative of a similarity of pairs of data records;
calculating similarity values for the pairs of data records based on a default value of the internal parameter; and
determining a similarity threshold for the similarity values corresponding to the user data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product having instruction codes for determining an internal parameter of a data clustering program for clustering data records, comprising:
-
a first set of instruction codes for inputting user data indicative of a similarity of pairs of data records;
a second set of instruction codes for calculating similarity values for the pairs of data records based on a default value of the internal parameter; and
a third set of instruction codes for determining a similarity threshold for the similarity values corresponding to the user data. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system for determining an internal parameter of a data clustering program for clustering data records, comprising:
-
means for inputting user data indicative of a similarity of pairs of data records;
means for calculating similarity values for the pairs of data records based on a default value of the internal parameter; and
means for determining a similarity threshold for the similarity values corresponding to the user data. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification