Method, system, and computer program product for visualizing a data structure
First Claim
1. A computer-implemented method of visualizing a multi-dimensional data set, wherein a data file is generated from the data set, wherein the data file includes a plurality of records, each record having at least one value associated with at least one attribute, the data file arranged in a tabular form having a plurality of rows and columns, wherein each of the rows represents an aggregate of all of the records for each combination of the values of the attributes, and wherein a label corresponds to one of the attributes of the data file, wherein the label has one or more associated label values called classes, comprising the steps of:
- (a) creating a multi-dimensional data structure from the data file, wherein said data structure comprises more than one level arranged in a hierarchal manner, wherein a first level corresponds to a first data table representing the data file, wherein each successive level of said data structure comprises an aggregated data table corresponding to at least one fewer attribute than a previous level, and wherein a top hierarchal level comprises a single row representing all of the aggregated data from the data file;
(b) creating a visualization of said data structure, wherein said visualization includes a first view showing a display of computed probability information for at least said top hierarchal level of said data structure, and a second view showing prior probabilities of each label value, wherein said first view does not display all levels of said data structure; and
(c) allowing user interaction with said visualization to trigger the computation and display of probability information for levels of said data structure not computed and displayed in said first view.
8 Assignments
0 Petitions
Accused Products
Abstract
A data structure visualization tool visualizes a data structure such as a decision table classifier. A data file based on a data set of relational data is stored as a relational table, where each row represents an aggregate of all the records for each combination of values of the attributes used. Once loaded into memory, an inducer is used to construct a hierarchy of levels, called a decision table classifier, where each successive level in the hierarchy has two fewer attributes. Besides a column for each attribute, there is a column for the record count (or more generally, sum of record weights), and a column containing a vector of probabilities (each probability gives the proportion of records in each class). Finally, at the top-most level, a single row represents all the data. The decision table classifier is then passed to the visualization tool for display and the decision table classifier is visualized. By building a representative scene graph adaptively, the visualization application never loads the whole data set into memory. Interactive techniques, such as drill-down and drill-through are used view further levels of detail or to retrieve some subset of the original data. The decision table visualizer helps a user understand the importance of specific attribute values for classification.
558 Citations
20 Claims
-
1. A computer-implemented method of visualizing a multi-dimensional data set, wherein a data file is generated from the data set, wherein the data file includes a plurality of records, each record having at least one value associated with at least one attribute, the data file arranged in a tabular form having a plurality of rows and columns, wherein each of the rows represents an aggregate of all of the records for each combination of the values of the attributes, and wherein a label corresponds to one of the attributes of the data file, wherein the label has one or more associated label values called classes, comprising the steps of:
-
(a) creating a multi-dimensional data structure from the data file, wherein said data structure comprises more than one level arranged in a hierarchal manner, wherein a first level corresponds to a first data table representing the data file, wherein each successive level of said data structure comprises an aggregated data table corresponding to at least one fewer attribute than a previous level, and wherein a top hierarchal level comprises a single row representing all of the aggregated data from the data file;
(b) creating a visualization of said data structure, wherein said visualization includes a first view showing a display of computed probability information for at least said top hierarchal level of said data structure, and a second view showing prior probabilities of each label value, wherein said first view does not display all levels of said data structure; and
(c) allowing user interaction with said visualization to trigger the computation and display of probability information for levels of said data structure not computed and displayed in said first view. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
inducing a decision table classifier from the data file, wherein said decision table classifier comprises one or more levels arranged in a hierarchal manner, wherein a first level corresponds to the data file, wherein each successive level of said decision table classifier corresponds to a table of data having two fewer attributes than a previous level, and wherein each level of said decision table classifier includes a first column corresponding to a proportion of said records associated in each class and a second column corresponding to a weight of said records, and wherein a top level comprises a single row representing all of the data from the data file.
-
-
3. The method of claim 2, wherein step (b) comprises:
(b) creating a visualization of said decision table. wherein said visualization includes a decision table pane view showing a display of probability information for a pair of said attributes in one of said levels of said decision table classifier, and a label probability pane view showing prior probabilities of each label value.
-
4. The method of claim 3, wherein said step (a) is performed on a computer system that has implemented on it an inducer to autorratically select attributes to put at each level of said decision table classifier.
-
5. The method of claim 3, wherein step (b) further comprises:
-
constructing a scene graph representing said decision table classifier;
displaying a visualization of said scene graph on a computer graphics display.
-
-
6. The method of claim 5, further comprising:
constructing said scene graph adaptively.
-
7. The method of claim 6, wherein said visualization includes one or more graphical representations corresponding to at least one of said levels of said decision table classifier, said graphical representation corresponding to two of the attributes of said decision table, wherein each of said graphical representations are referred to as a cake chart.
-
8. The method of claim 7, wherein said decision table pane view includes a matrix of one or more of said cake charts, each of said cake charts representing a probability distribution of each of the attribute values at a respective level of said decision table.
-
9. The method of claim 7, wherein each of said cake charts represents a normalized conditional probability of each of the attribute values at a respective level of said decision table.
-
10. The method of claim 7, wherein said constructing step comprises:
-
creating a partial scene graph representative of a portion of said decision table, wherein said partial scene graph comprises a first node, wherein said first node corresponds to said top level of said decision table classifier;
storing said partial scene graph in a memory unit of a computer graphics display; and
rendering a first visualization corresponding to said partial scene graph on said computer graphics display, said visualization comprising a cake chart representing said top level of said decision table classifier.
-
-
11. The method of claim 10, wherein a viewer desires to observe more detail in said visualization, further comprising:
-
selecting said cake chart from said first visualization with a graphical user interface device;
generating a next portion of said partial scene graph, said next portion comprising one or more nodes, each node corresponding to a set of records within a next lower level of said decision table classifier; and
rendering a visualization corresponding to said next portion of said partial scene graph on said computer graphics display, said computer graphics display permitting said viewer to select one or more cake charts and retrieve two additional attributes corresponding to said selected cake charts for display, said two additional attributes corresponding to a next lower level of said decision table classifier.
-
-
12. The method of claim 11, further comprising:
providing a graphical user interface device to permit said viewer to select one or more cake charts and view the corresponding records of the data set corresponding to said selected cake charts for display.
-
13. The method of claim 11, further comprising:
providing a graphical user interface device to permit said viewer to select one or more cake charts and view a visualization corresponding to a more aggregated level of said decision table corresponding to said selected cake charts for display.
-
14. The method of claim 5, further comprising the step of:
displaying a controller that permits a user to control the filtering of attributes based on the importance of the attributes to a prediction.
-
15. The method of claim 4, further comprising the step of:
displaying a filter panel that permits a viewer to filter attribute values based on the number of counts.
-
16. A system for visualizing a multi-dimensional data set, wherein a data file is generated from the data set, wherein a data structure is created from the data file that includes a plurality of labeled records, each record having at least one attribute value and a corresponding class label, and the data structure being capable of assigning class labels to unlabeled records based on attribute values found in the unlabeled records, wherein the data structure comprises more than one level arranged in a hierarchal manner, comprising:
-
means for constructing a scene graph representing the data structure;
means for displaying a visualization of said scene graph, wherein said visualization includes a first view showing a display of computed probability information for at least a top hierarchal level of the data structure, and a second view showing prior probabilities of each label value, wherein said first view does not display all levels of the data structure; and
means for allowing user interaction with said visualization to trigger the computation and display of probability information for levels of the data structure not displayed in said first view. - View Dependent Claims (17)
means for adaptively building a scene graph representative of the data structure.
-
-
18. The system of said 17, further comprising:
means for inducing a decision table classifier from the data file.
-
19. A computer program product comprising a computer useable medium having computer program logic recorded thereon for enabling a processor in a computer system to visually represent the structure of a decision table classifier, the decision table classifier being generated from a training set of labeled records, each record having at least one attribute value and a corresponding class label, and the decision table classifier being capable of assigning class labels to unlabeled records based on attribute values found in the unlabeled records, wherein the decision table classifier comprises more than one level arranged in a hierarchical manner, said computer program logic comprising:
-
means for enabling the processor to construct a scene graph representing the decision table classifier;
means for enabling the processor to display a visualization of said scene graph, wherein said visualization includes a decision table pane view showing a display of computed conditional probability information for a pair of said attributes in at least a top hierarchal level of said decision table classifier, and a label probability pane view showing prior probabilities of each label value. wherein said decision table pane view does not display all levels of said decision table classifier; and
means for enabling the processor to allow user interaction with said visualization to trigger the computation and display of probability information for levels of said decision table classifier not displayed in said decision table pane view.
-
-
20. An integrated data mining system comprising:
-
an inducer;
means for configuring said inducer to generate a first data file representing structure of an evidence classifier, a second data file representing structure of a decision-tree classifier, and a third data file representing structure of a decision table classifier, wherein said decision table classifier structure comprises more than one level arranged in a hierarchal manner;
means for visualizing said evidence classifier structure based on said first data file;
means for visualizing said decision-tree classifier structure based on said second data file;
means for visualizing said decision table classifier structure based on said third data file, wherein said means for visualizing said decision table classifier structure includes a view showing a display of computed probability information for at least a top hierarchal level of said decision table classifier structure; and
means for allowing user interaction with said visualization to trigger the computation and display of probability information for levels of said decision table classifier structure not displayed in said view.
-
Specification