INTERACTIVE DATA MINING SYSTEM

US 20090043714A1
Filed: 08/10/2007
Published: 02/12/2009
Est. Priority Date: 08/10/2007
Status: Active Grant

First Claim

Patent Images

1. A computer readable medium storing a program for interactive data mining including programming instructions for:

reading in a set of data vectors wherein each data vector comprises a class attribute, and a plurality of additional attributes;

counting a plurality of counts of times each particular attribute of said plurality of additional attributes, takes on each of a set of possible values for the particular attribute;

presenting a plurality of histograms on a computer display wherein each of said plurality of histograms includes counts for one of said plurality of additional attributes versus attribute value and wherein said plurality of histograms are presented in a sorted order.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An interactive data mining system (100, 3000) that is suitable for data mining large high dimensional (e.g., 200 dimension) data sets is provided. The system graphically presents rules in a context allowing users to readily gain an intuitive appreciation of the significance of important attributes (data fields) in the data. The system (100, 3000) uses metrics to quantify the importance of the various data attributes, data values, attribute/value pairs, ranks them according to the metrics and displays histograms and lists of attributes and values in order according to the metric, thereby allowing the user to rapidly find the most interesting aspects of the data. The system explores the impact of user defined constraints and presents histograms and rule cubes including superposed and interleaved rule cubes showing the effect of the constraints.

35 Citations

View as Search Results

36 Claims

1. A computer readable medium storing a program for interactive data mining including programming instructions for:
- reading in a set of data vectors wherein each data vector comprises a class attribute, and a plurality of additional attributes;
  
  counting a plurality of counts of times each particular attribute of said plurality of additional attributes, takes on each of a set of possible values for the particular attribute;
  
  presenting a plurality of histograms on a computer display wherein each of said plurality of histograms includes counts for one of said plurality of additional attributes versus attribute value and wherein said plurality of histograms are presented in a sorted order.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The computer readable medium according to claim 1 wherein said counting is subject to at least one constraint on at least one of said plurality of additional attributes.
  - 3. The computer readable medium according to claim 1 wherein said sorted order is based on a sorting of the histograms according to a metric of non-randomness of distributions shown in said histograms.
  - 4. The computer readable medium according to claim 3 wherein the metric of non-randomness is a metric of trend strength.
  - 5. The computer readable medium according to claim 4 further comprising programming instructions for adding arrows proximate trends in one or more of said plurality of histograms.
  - 6. The computer readable medium according to claim 3 wherein the metric of non-randomness is a metric of discriminative power with respect to said class attribute.
  - 7. The computer readable medium according to claim 1 including additional programming instructions for:
    - discretizing attribute values of at least a subset of attributes in said set of data vectors prior to counting said plurality of counts.
  - 8. The computer readable medium according to claim 1 wherein the programming instructions for presenting the plurality of histograms on the computer display present different histograms corresponding to different values of the class attribute.
  - 9. The computer readable medium according to claim 8 wherein for each additional attribute the histograms corresponding to a set of values of the class attribute are arranged in a column on the computer display.
  - 10. The computer readable medium according to claim 9 wherein said metric on non-randomness is summed over a said set of values of said class attribute.
  - 11. The computer readable medium according to claim 1 wherein in response to a user designating one of said plurality of histograms showing an enlarged version of the designated histogram on the computer display.
  - 12. The computer readable medium according to claim 1 wherein said histograms are augmented by up and down arrows located proximate said histograms to show trend type.
  - 13. The computer readable medium according to claim 1 wherein certain of said histograms are truncated to show only a portion of attribute values and are color coded to indicate truncation.
  - 14. The computer readable medium according to claim 1 including additional programming instructions for automatically scaling the plurality of histograms.

15. A computer readable medium storing a program for interactive data mining including programming instructions for:
- reading in a set of data vectors wherein each data vector comprises a plurality of attributes;
  
  processing the set of data vectors in order to count occurrences of each value of a first attribute whereby a first set of counts is obtained;
  
  processing the set of data vectors in order to count occurrences of each value of the first attribute subject to at least one constraint as to at least one other attribute value, whereby a second set of counts is obtained;
  
  displaying the first set of counts and the second set of counts in the form of at least two superposed histograms including a first histogram based on the first set of counts and a second histogram based on the second set of counts.
- View Dependent Claims (16)
- - 16. The computer readable medium according to claim 15 wherein the at least two superposed histograms are distinguished by color.

17. A computer readable medium storing a program for interactive data mining including programming instructions for:
- reading in a set of data vectors wherein each data vector comprises a plurality of attributes;
  
  processing the set of data vectors in order to obtain a set of counts of occurrences of each combination of values of a first attribute and a second attribute;
  
  displaying on a computer display a graphical representation of the set of counts, wherein the graphical representation includes a grid of areas, the grid comprising a plurality of rows of areas and a plurality of columns of areas, wherein each row corresponds to an i^THvalue of the first attribute and each column corresponds to a j^THvalue of the second attribute, and wherein each (i,j)^THarea in the grid of areas includes;
  
  a first graphical element that reflects a count of data vectors that have the i^THvalue of the first attribute and a count of data vectors that have the j^THvalue of the second attribute.
- View Dependent Claims (18, 19, 20, 21, 22, 23)
- - 18. The computer readable medium according to claim 17 wherein said first graphical element reflects a proportion of data vectors having the j^THvalue of the second attribute that have the i^THvalue of the first attribute.
  - 19. The computer readable medium according to claim 18 wherein each (i,j)^THarea in the grid of areas further comprises:
    - a second graphical element that reflects a second proportion of data vectors having the i^THvalue of the first attribute that also have j^THvalue of the second attribute.
  - 20. The computer readable medium according to claim 19 wherein each (i,j)^THarea in the grid of areas further comprises a third graphical element that reflects a third proportion of the data vectors that have the i^THvalue of the first attribute and the j^THvalue of the second attribute.
  - 21. The computer readable medium according to claim 19 wherein:
    - the first graphical element comprises a first block having a height that is proportional to the first proportion; and
      
      the second graphical element comprises a second block having a width that is proportional to the second proportion.
  - 22. The computer readable medium according to claim 21 wherein each (i,j)^THarea further comprises a disk having an area that is proportional to a third proportion of the data vectors that have the i^THvalue of the first attribute and the j^THvalue of the second attribute.
  - 23. The computer readable medium according to claim 21 wherein each (i,j)^THarea further comprises a disk having a color parameter that is proportionate to a third proportion of the data vectors that have the i^THvalue of the first attribute and the j^THvalue of the second attribute.

24. A computer readable medium storing a program for interactive data mining including programming instructions for:
- reading in a set of data vectors wherein each data vector comprises a class attribute, and a plurality of additional attributes;
  
  for each value of the class attribute processing the data vectors in order to obtain a frequency count for each value of each of the plurality of additional attributes;
  
  for a plurality of groups of the plurality of additional attributes evaluating a metric of similarity of trends in the frequency count as a function of attribute value;
  
  outputting an identification of a most similar group of additional attributes as identified by the metric of similarity.
- View Dependent Claims (25, 26)
- - 25. The computer readable medium according to claim 24 wherein said plurality of groups comprise a plurality of pairs and said most similar group comprises a most similar pair.
  - 26. The computer readable medium according to claim 25 further comprising programming instructions for:
    - displaying on a computer display a first rule cube for a first of said most similar pair of additional attributes and a second rule cube for a second of said most similar pair of additional attributes.

27. A computer readable medium storing a program for interactive data mining including programming instructions for:
- reading in a set of data vectors wherein each data vector comprises a plurality of attributes;
  
  processing the set of data vectors in order to obtain a set of counts of occurrences of each combination of values of a first attribute, a second attribute and a third attribute;
  
  displaying on a computer display a graphical representation of the set of counts, wherein the graphical representation includes a grid of areas, the grid comprising a plurality of rows of areas and a plurality of columns of areas, wherein each row corresponds to an i^THvalue of the first attribute and each column corresponds to a j^THvalue of the second attribute, and wherein each (i,j)^THarea in the grid of squares includes a histogram that includes a plurality of bars, wherein each successive bar in the plurality of bars has a height proportional to a k^THvalue of the third attribute.
- View Dependent Claims (28)
- - 28. The computer readable medium according to claim 27 further comprising programming instructions for:
    - detecting user selection of a k^THbar in one of the areas and in response thereto highlighting the k^THbar in each of the areas.

29. A computer readable medium storing a program for interactive data mining including programming instructions for:
- reading in a set of data vectors wherein each data vector comprises a class attribute, and a plurality of additional attributes;
  
  for each value of the class attribute processing the data vectors in order to obtain a frequency count for each value of each of the plurality of additional attributes;
  
  for each additional attribute evaluating a metric of non-randomness of said frequency count verses attribute value;
  
  ranking said additional attributes based on said metric of non-randomness;
  
  outputting information based on said ranking to a user.
- View Dependent Claims (30, 31, 32, 33)
- - 30. The computer readable medium according to claim 29 wherein outputting information based on said ranking comprises:
    - outputting a sorted list of at least a subset of said additional attributes based on said ranking.
  - 31. The computer readable medium according to claim 29 wherein outputting information based on said ranking comprises:
    - outputting at least a rule cube for a highest ranked additional attribute wherein said rule cube includes a grid of graphical elements sized based on counts of rules involving said highest ranked additional attribute and said class attribute.
  - 32. The computer readable medium according to claim 29 wherein ranking said additional attributes comprises ranking said additional attributes according to a metric of trend strength.
  - 33. The computer readable medium according to claim 29 wherein ranking said additional attributes comprises ranking said additional attributes according to a metric of discriminative power with respect to said class attribute.

34. A computer readable medium storing a program for interactive data mining including programming instructions for:
- (a) reading in a set of data vectors wherein each data vector comprises a plurality of attributes;
  
  (b) processing the data vectors in order to obtain a set of frequency counts for a set of values of each of the plurality of attributes;
  
  (c) processing the set of frequency counts for each of the plurality of attributes in order quantify a strength of a type of trend in each of the plurality of attributes;
  
  (d) applying each of a plurality of attribute-value constraints to the set of data vectors and repeating steps (b) and (c);
  
  (e) for each of the plurality of attributes comparing the strength of the type of trend with and without each of the constraints;
  
  (f) outputting information identifying at least one of the plurality of attributes for which there was a change in trend strength as a result of imposing a particular one of the plurality of attribute-value constraints, and outputting information identifying the particular attribute-value constraint.

35. A computer readable medium storing a program for interactive data mining including programming instructions for:
- reading in a set of data vectors wherein each data vector comprises a plurality of attributes;
  
  reading in user input of a definition of a new attribute that is defined in terms of a subset of the plurality of attributes;
  
  processing the data vectors in order to obtain a set of frequency counts for a set of values of the new attribute;
  
  outputting information to the user based on the set of frequency counts.
- View Dependent Claims (36)
- - 36. The computer readable medium according to claim 35 including programming instructions for:
    - prior to processing the data vectors in order to obtain the set of frequency counts, discretizing a set of values of the new attribute.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Motorola Solutions, Inc.
Original Assignee
Motorola, Inc. (Motorola Solutions, Inc.)
Inventors
Liu, Bing, Xiao, Weimin, Benkler, Jeffrey G., Zhao, Kaidi

Granted Patent

US 7,979,362 B2
Time in Patent Office

Days
Field of Search
US Class Current

706/11
CPC Class Codes

G06F 16/2462   Approximate or statistical ...

G06F 16/2465   Query processing support fo...

G06F 16/904   Browsing; Visualisation the...

G06F 2216/03   Data mining

INTERACTIVE DATA MINING SYSTEM

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

35 Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

INTERACTIVE DATA MINING SYSTEM

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

35 Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links