INTERACTIVE DATA MINING SYSTEM
First Claim
1. A computer readable medium storing a program for interactive data mining including programming instructions for:
- reading in a set of data vectors wherein each data vector comprises a class attribute, and a plurality of additional attributes;
counting a plurality of counts of times each particular attribute of said plurality of additional attributes, takes on each of a set of possible values for the particular attribute;
presenting a plurality of histograms on a computer display wherein each of said plurality of histograms includes counts for one of said plurality of additional attributes versus attribute value and wherein said plurality of histograms are presented in a sorted order.
2 Assignments
0 Petitions
Accused Products
Abstract
An interactive data mining system (100, 3000) that is suitable for data mining large high dimensional (e.g., 200 dimension) data sets is provided. The system graphically presents rules in a context allowing users to readily gain an intuitive appreciation of the significance of important attributes (data fields) in the data. The system (100, 3000) uses metrics to quantify the importance of the various data attributes, data values, attribute/value pairs, ranks them according to the metrics and displays histograms and lists of attributes and values in order according to the metric, thereby allowing the user to rapidly find the most interesting aspects of the data. The system explores the impact of user defined constraints and presents histograms and rule cubes including superposed and interleaved rule cubes showing the effect of the constraints.
35 Citations
36 Claims
-
1. A computer readable medium storing a program for interactive data mining including programming instructions for:
-
reading in a set of data vectors wherein each data vector comprises a class attribute, and a plurality of additional attributes; counting a plurality of counts of times each particular attribute of said plurality of additional attributes, takes on each of a set of possible values for the particular attribute; presenting a plurality of histograms on a computer display wherein each of said plurality of histograms includes counts for one of said plurality of additional attributes versus attribute value and wherein said plurality of histograms are presented in a sorted order. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer readable medium storing a program for interactive data mining including programming instructions for:
-
reading in a set of data vectors wherein each data vector comprises a plurality of attributes; processing the set of data vectors in order to count occurrences of each value of a first attribute whereby a first set of counts is obtained; processing the set of data vectors in order to count occurrences of each value of the first attribute subject to at least one constraint as to at least one other attribute value, whereby a second set of counts is obtained; displaying the first set of counts and the second set of counts in the form of at least two superposed histograms including a first histogram based on the first set of counts and a second histogram based on the second set of counts. - View Dependent Claims (16)
-
-
17. A computer readable medium storing a program for interactive data mining including programming instructions for:
-
reading in a set of data vectors wherein each data vector comprises a plurality of attributes; processing the set of data vectors in order to obtain a set of counts of occurrences of each combination of values of a first attribute and a second attribute; displaying on a computer display a graphical representation of the set of counts, wherein the graphical representation includes a grid of areas, the grid comprising a plurality of rows of areas and a plurality of columns of areas, wherein each row corresponds to an iTH value of the first attribute and each column corresponds to a jTH value of the second attribute, and wherein each (i,j)TH area in the grid of areas includes; a first graphical element that reflects a count of data vectors that have the iTH value of the first attribute and a count of data vectors that have the jTH value of the second attribute. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
-
24. A computer readable medium storing a program for interactive data mining including programming instructions for:
-
reading in a set of data vectors wherein each data vector comprises a class attribute, and a plurality of additional attributes; for each value of the class attribute processing the data vectors in order to obtain a frequency count for each value of each of the plurality of additional attributes; for a plurality of groups of the plurality of additional attributes evaluating a metric of similarity of trends in the frequency count as a function of attribute value; outputting an identification of a most similar group of additional attributes as identified by the metric of similarity. - View Dependent Claims (25, 26)
-
-
27. A computer readable medium storing a program for interactive data mining including programming instructions for:
-
reading in a set of data vectors wherein each data vector comprises a plurality of attributes; processing the set of data vectors in order to obtain a set of counts of occurrences of each combination of values of a first attribute, a second attribute and a third attribute; displaying on a computer display a graphical representation of the set of counts, wherein the graphical representation includes a grid of areas, the grid comprising a plurality of rows of areas and a plurality of columns of areas, wherein each row corresponds to an iTH value of the first attribute and each column corresponds to a jTH value of the second attribute, and wherein each (i,j)TH area in the grid of squares includes a histogram that includes a plurality of bars, wherein each successive bar in the plurality of bars has a height proportional to a kTH value of the third attribute. - View Dependent Claims (28)
-
-
29. A computer readable medium storing a program for interactive data mining including programming instructions for:
-
reading in a set of data vectors wherein each data vector comprises a class attribute, and a plurality of additional attributes; for each value of the class attribute processing the data vectors in order to obtain a frequency count for each value of each of the plurality of additional attributes; for each additional attribute evaluating a metric of non-randomness of said frequency count verses attribute value; ranking said additional attributes based on said metric of non-randomness; outputting information based on said ranking to a user. - View Dependent Claims (30, 31, 32, 33)
-
-
34. A computer readable medium storing a program for interactive data mining including programming instructions for:
-
(a) reading in a set of data vectors wherein each data vector comprises a plurality of attributes; (b) processing the data vectors in order to obtain a set of frequency counts for a set of values of each of the plurality of attributes; (c) processing the set of frequency counts for each of the plurality of attributes in order quantify a strength of a type of trend in each of the plurality of attributes; (d) applying each of a plurality of attribute-value constraints to the set of data vectors and repeating steps (b) and (c); (e) for each of the plurality of attributes comparing the strength of the type of trend with and without each of the constraints; (f) outputting information identifying at least one of the plurality of attributes for which there was a change in trend strength as a result of imposing a particular one of the plurality of attribute-value constraints, and outputting information identifying the particular attribute-value constraint.
-
-
35. A computer readable medium storing a program for interactive data mining including programming instructions for:
-
reading in a set of data vectors wherein each data vector comprises a plurality of attributes; reading in user input of a definition of a new attribute that is defined in terms of a subset of the plurality of attributes; processing the data vectors in order to obtain a set of frequency counts for a set of values of the new attribute; outputting information to the user based on the set of frequency counts. - View Dependent Claims (36)
-
Specification