Method for partitioning a data set into frequency vectors for clustering
First Claim
1. A method of partitioning a data set, comprising:
- identifying a plurality of robust discriminators within the data set;
counting occurrences of a predetermined relationship between each non-discriminator data element in the data set and each of the identified robust discriminators;
creating a frequency vector for each non-discriminator element based on the counted occurrences;
clustering the frequency vectors into clusters; and
forming a knowledge-based model of the data set based on the clusters.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of partitioning a data set in which certain elements of the data set are first identified as robust discriminator data elements. For the other non-discriminator data elements, an embodiment of the invention counts occurrences of a predetermined relationship between each non-discriminator data element and the identified robust discriminator data elements, and maps the counted occurrences onto vectors a multi-dimensional frequency space. Finally, an embodiment forms the frequency vectors into clusters according to a distance or adjacency metric, where each cluster represents a different contextual class of meaningful attributes. The data set is thereby partitioned into an arbitrary number of clusters according to the discovered relationships between the non-discriminator data elements and the robust discriminator data elements so that all of the non-discriminator data elements located in the same cluster possess similar attributes.
45 Citations
26 Claims
-
1. A method of partitioning a data set, comprising:
-
identifying a plurality of robust discriminators within the data set;
counting occurrences of a predetermined relationship between each non-discriminator data element in the data set and each of the identified robust discriminators;
creating a frequency vector for each non-discriminator element based on the counted occurrences;
clustering the frequency vectors into clusters; and
forming a knowledge-based model of the data set based on the clusters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method of extracting user profile information from a recorded log of user interactions with an automated response system, comprising:
-
selecting a discriminator within the recorded log;
counting occurrences of a predetermined relationship between each non-discriminator data element in the recorded log and the selected discriminator;
generating a frequency vector for each non-discriminator data element based on the counted occurrences;
clustering the frequency vectors into clusters, based on a distance measure between each of the frequency vectors; and
forming a knowledge-based model of the recorded log based on the clusters. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
-
23. A machine-readable medium having stored thereon a plurality of executable instructions, the plurality of instructions comprising instructions to:
-
select a discriminator from a data set, based on a predetermined discriminator element selection criteria;
count occurrences of a predetermined relationship between each non-discriminator element in the data set and the selected discriminator;
generate a frequency vector for each non-discriminator element based on the counted occurrences;
cluster the frequency vectors into clusters; and
form a knowledge-based model of the data set based on the clusters. - View Dependent Claims (24, 25, 26)
-
Specification