Interactive data mining system
First Claim
1. A non-transitory computer-readable medium storing a program for interactive data mining including programming instructions for:
- reading in a set of data vectors wherein each data vector comprises a class attribute, and a plurality of additional attributes;
counting a plurality of counts of times each particular attribute of said plurality of additional attributes, takes on each of a set of possible values for the particular attribute; and
presenting a plurality of histograms on a computer display wherein each of said plurality of histograms includes counts for one of said plurality of additional attributes versus attribute value and wherein said plurality of histograms are presented in a sorted order;
wherein said sorted order is based on a sorting-of the histograms according to a metric of non-randomness of distributions shown in said histograms;
wherein the metric of non-randomness is a metric of discriminative power with respect to said class attribute.
2 Assignments
0 Petitions
Accused Products
Abstract
An interactive data mining system (100, 3000) that is suitable for data mining large high dimensional (e.g., 200 dimension) data sets is provided. The system graphically presents rules in a context allowing users to readily gain an intuitive appreciation of the significance of important attributes (data fields) in the data. The system (100, 3000) uses metrics to quantify the importance of the various data attributes, data values, attribute/value pairs, ranks them according to the metrics and displays histograms and lists of attributes and values in order according to the metric, thereby allowing the user to rapidly find the most interesting aspects of the data. The system explores the impact of user defined constraints and presents histograms and rule cubes including superposed and interleaved rule cubes showing the effect of the constraints.
30 Citations
10 Claims
-
1. A non-transitory computer-readable medium storing a program for interactive data mining including programming instructions for:
-
reading in a set of data vectors wherein each data vector comprises a class attribute, and a plurality of additional attributes; counting a plurality of counts of times each particular attribute of said plurality of additional attributes, takes on each of a set of possible values for the particular attribute; and presenting a plurality of histograms on a computer display wherein each of said plurality of histograms includes counts for one of said plurality of additional attributes versus attribute value and wherein said plurality of histograms are presented in a sorted order; wherein said sorted order is based on a sorting-of the histograms according to a metric of non-randomness of distributions shown in said histograms; wherein the metric of non-randomness is a metric of discriminative power with respect to said class attribute. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
Specification