Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications
First Claim
1. Apparatus for providing a visualized hierarchical display of categorized event data, said data being a collection of records, wherein each record is associated with an occurrence of a corresponding event and comprises a plurality of attribute/value pairs characterizing the event or an individual user associated with the event, the apparatus comprising:
- a processor;
a memory connected to the processor and storing computer executable instructions therein;
circuitry, connected to the processor, for accessing a plurality of data records, residing on a data storage medium, that collectively forms a dataset representing a collection of events and for applying the data records from the medium to the processor; and
a display operative in conjunction with the processor;
wherein the processor, in response to execution of the stored instructions;
classifies the data records, based on the attribute/value pairs associated with each such record, into a plurality of mutually exclusive first clusters;
determines a measure of similarity between each pair of said first clusters so as to yield a plurality of similarity measures for the first clusters representing the dataset; and
forms, based on the similarity measures, a multi-level hierarchical cluster organization such that said first clusters are situated, as leaf nodes, at a lowest level of a hierarchy with second clusters being situated, as cluster group nodes, at successively higher levels of the hierarchy and formed as a result of selectively and iteratively combining clusters that are sufficiently similar to each other so as to form combined clusters in order to define a nodal set wherein each of the combined clusters replaces the clusters so combined to form said each combined cluster; and
visually renders the hierarchical organization on the display.
3 Assignments
0 Petitions
Accused Products
Abstract
A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed. The system permits a user to browse through the hierarchy, and, to readily comprehend segment inter-relationships, selectively expand and contract the displayed hierarchy, as desired, as well as to compare two selected segments or segment groups together and graphically display the results of that comparison. An alternative discriminant-based cluster scoring technique is also presented.
255 Citations
68 Claims
-
1. Apparatus for providing a visualized hierarchical display of categorized event data, said data being a collection of records, wherein each record is associated with an occurrence of a corresponding event and comprises a plurality of attribute/value pairs characterizing the event or an individual user associated with the event, the apparatus comprising:
-
a processor;
a memory connected to the processor and storing computer executable instructions therein;
circuitry, connected to the processor, for accessing a plurality of data records, residing on a data storage medium, that collectively forms a dataset representing a collection of events and for applying the data records from the medium to the processor; and
a display operative in conjunction with the processor;
wherein the processor, in response to execution of the stored instructions;
classifies the data records, based on the attribute/value pairs associated with each such record, into a plurality of mutually exclusive first clusters;
determines a measure of similarity between each pair of said first clusters so as to yield a plurality of similarity measures for the first clusters representing the dataset; and
forms, based on the similarity measures, a multi-level hierarchical cluster organization such that said first clusters are situated, as leaf nodes, at a lowest level of a hierarchy with second clusters being situated, as cluster group nodes, at successively higher levels of the hierarchy and formed as a result of selectively and iteratively combining clusters that are sufficiently similar to each other so as to form combined clusters in order to define a nodal set wherein each of the combined clusters replaces the clusters so combined to form said each combined cluster; and
visually renders the hierarchical organization on the display. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
summarizes each of the first clusters into a corresponding first segment so as to define a plurality of first segments such that each of said first segments contains records, from within its associated one of the first clusters, that exhibit similar behavior and similar properties;
determines the similarity measures between each pair of said first segments so as to yield a plurality of similarity measures; and
forms the multi-level hierarchical organization, through agglomerative clustering, of the first segments.
-
-
3. The apparatus in claim 2 wherein the processor, in response to execution of the stored instructions, forms a root node that represents the entire collection and is situated at a highest level of the hierarchy.
-
4. The apparatus in claim 3 wherein the processor, in response to execution of the stored instructions, performs agglomerative clustering by:
-
(a) determining a measure of distance between each pair of members in the nodal set, the nodal set initially being defined as having all of said members, as child nodes in the hierarchy, and, (b) for each pair of said members having nearest distances therebetween, combining said pair of members to yield a parent node at a higher level of the hierarchy, wherein the parent node replaces the pair of said members in the nodal set; and
(c) iteratively repeating operations (a) and (b) until the root node is formed that represents all the members in the collection.
-
-
5. The apparatus in claim 4 wherein the processor, in response to execution of the stored instructions, reduces a level of the hierarchy by successively merging, based on nearest similarity measures, two linked nodes at adjacent levels in the hierarchy so as to form a single substitute node having a group of segments associated with the two nodes that have been merged.
-
6. The apparatus in claim 4 wherein the processor, in response to execution of the stored instructions:
-
accepts a user-selection of a segment in the hierarchy so as to define a first selected segment;
scores each of the attribute/value pair associated with the first selected segment as to how well each of said attribute/value pairs associated with the first selected segment characterizes the first selected segment;
rank orders the attribute/value pairs within the first selected segment so as to define a first rank order; and
visually displays each one of a plurality of the attribute/value pairs within the first selected segment in said first rank order along with an indication representative of a magnitude of the score of said one of the plurality of said attribute/value pairs within the first selected segment.
-
-
7. The apparatus in claim 6 wherein the indication is graphical.
-
8. The apparatus in claim 7 wherein each of the records reflects a user who visits a predefined web site with the attributes in the record reflecting information regarding a transaction in which the user has engaged with the web site or characteristic information, regarding the user, which the user has furnished to the web site.
-
9. The apparatus in claim 7 wherein the processor, in response to execution of the stored instructions, determines the score of each of the attribute/value pairs on a discriminative basis.
-
10. The apparatus in claim 7 wherein the processor, in response to execution of the stored instructions:
-
generates a graphical user interface on the display; and
selectively expands or contracts the displayed hierarchy based on input commands based on user input from an individual interacting with the apparatus through the graphical user interface.
-
-
11. The apparatus in claim 4 wherein the processor, in response to execution of the stored instructions:
-
accepts user-selection of a pair of segments in the hierarchy so as to define first an second selected segments in the hierarchy;
scores each of the events associated with the second selected segment as to how well each of the attribute/value pairs associated with the second selected segment characterizes events associated with a first selected segment;
rank orders the attribute/value pairs associated with the second selected segment so as to define a second rank order; and
visually displays each one of a plurality of the attribute/value pairs associated with the second selected segment in said second rank order along with an indication representative of a magnitude of the score of each one of the plurality of said attribute/value pairs so as to facilitate a visual comparison of the attribute/value pairs of the first and second selected segments and to visually assess whether each of the plurality of said attribute/value pairs associated with the second segment is more likely to be exhibited by the first or second selected segments.
-
-
12. The apparatus in claim 11 wherein the processor, in response to the stored instructions, determines the score of each of the events associated with the second segment based on corresponding probabilities of said each event occurring or not occurring in all of the segments.
-
13. The apparatus in claim 12 wherein the processor, in response to the stored instructions, ascertains the corresponding probabilities in response to the attribute/value pairs associated with said each event.
-
14. The apparatus in claim 11 wherein the processor, in response to the stored instructions, determines the score of said each of the events associated with the second segment through use of discriminant values.
-
15. The apparatus in claim 11 wherein the indication is graphical.
-
16. The apparatus in claim 15 wherein each of the records reflects a user who visits a predefined web site with the attributes in the record reflecting information regarding a transaction in which the user has engaged with the web site or characteristic information, regarding the user, which the user has furnished to the web site.
-
17. The apparatus in claim 15 wherein the processor, in response to execution of the stored instructions:
-
generates a graphical user interface on the display; and
selectively expands or contracts the displayed hierarchy based on input commands based on user input from an individual interacting with the apparatus through the graphical user interface.
-
-
18. The apparatus in claim 15 wherein the processor, in response to the stored instructions, limits a depth of the hierarchy to a predefined level.
-
19. A method, for use in conjunction with apparatus, for providing a visualized hierarchical display of categorized event data, said data being a collection of records, wherein each record is associated with an occurrence of a corresponding event and comprises a plurality of attribute/value pairs characterizing the event or an individual user associated with the event, the apparatus having:
- a processor;
a memory connected to the processor and storing computer executable instructions therein;
circuitry, connected to the processor, for accessing a plurality of data records, residing on a data storage medium, that collectively forms a dataset representing a collection of events and for applying the data records from the medium to the processor; and
a display operative in conjunction with the processor;
wherein the method comprises the steps performed by the processor, in response to execution of the stored instructions, of;classifying the data records, based on the attribute/value pairs associated with each such record, into a plurality of mutually exclusive first clusters;
determining a measure of similarity between each pair of said first clusters so as to yield a plurality of similarity measures for the first clusters representing the dataset; and
forming, based on the similarity measures, a multi-level hierarchical cluster organization such that said first clusters are situated, as leaf nodes, at a lowest level of a hierarchy with second clusters being situated, as cluster group nodes, at successively higher levels of the hierarchy and formed as a result of selectively and iteratively combining clusters that are sufficiently similar to each other so as to form combined clusters in order to define a nodal set wherein each combined cluster replaces the clusters so combined to form said each combined clusters; and
visually renders the hierarchical organization on the display. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
summarizing each of the first clusters into a corresponding first segment so as to define a plurality of first segments such that each of said first segments contains records, from within its associated one of the first clusters, that exhibit similar behavior and similar properties;
determining the similarity measures between each pair of said first segments so as to yield a plurality of similarity measures; and
forming the multi-level hierarchical organization, through agglomerative clustering, of the first segments.
- a processor;
-
21. The method in claim 20 further comprising the step of forming a root node that represents the entire collection and is situated at a highest level of the hierarchy.
-
22. The method in claim 21 wherein agglomerative clustering is performed by:
-
(a) determining a measure of distance between each pair of members in a nodal set, the nodal set initially being defined as having all of said members, as child nodes in the hierarchy, and, (b) for each pair of said members having nearest distances therebetween, combining said pair of members to yield a parent node at a higher level of the hierarchy, wherein the parent node replaces the pair of said members in the nodal set; and
(c) iteratively repeating operations (a) and (b) until the root node is formed that represents all the members in the collection.
-
-
23. The method in claim 22 further comprising the step of reducing a level of the hierarchy by successively merging, based on nearest similarity measures, two linked nodes at adjacent levels in the hierarchy so as to form a single substitute node having a group of segments associated with the two nodes that have been merged.
-
24. The method in claim 22 further comprising the steps of:
-
accepting a user-selection of a segment in the hierarchy so as to define a first selected segment;
scoring each of the attribute/value pairs within the first selected segment as to how well each of said attribute/value pairs associated with the first selected segment characterizes the first selected segment;
rank ordering the attribute/value pairs within the first selected segment so as to define a first rank order; and
visually displaying each one of a plurality of the attribute/value pairs within the first selected segment in said first rank order along with an indication representative of a magnitude of the score of said one of the plurality of said attribute/value pairs within the first selected segment.
-
-
25. The method in claim 24 wherein the indication is graphical.
-
26. The method in claim 25 wherein each of the records reflects a user who visits a predefined web site with the attributes in the record reflecting information regarding a transaction in which the user has engaged with the web site or characteristic information, regarding the user, which the user has furnished to the web site.
-
27. The method in claim 25 further comprising the steps of:
-
generating a graphical user interface on the display; and
selectively expanding or contracting the displayed hierarchy based on input commands based on user input from an individual interacting with the apparatus through the graphical user interface.
-
-
28. The method in claim 24 further comprising the step of determining the score of each of the attribute/value pairs on a discriminative basis.
-
29. The method in claim 22 further comprising the steps of:
-
accepting a user-selection of a pair of segments in the hierarchy so as to define first and second selected segments in the hierarchy;
scoring each of the attribute/value pairs associated with the second selected segment as to how well each of said attribute/value pairs associated with the second selected segment characterizes events associated with a first selected segment;
rank ordering the attribute/value pairs associated with the second selected segment so as to define a second rank order; and
visually displaying each one of a plurality of the attribute/value pairs associated with the second selected segment in said second rank order along with an indication representative of a magnitude of the score of each one of the plurality of said attribute/value pairs, so as to facilitate a visual comparison of the attribute/value pairs of the first and second selected segments and to visually assess whether each of the plurality of said attribute/value pairs associated with the second segment is more likely to be exhibited by the first or second selected segments.
-
-
30. The method in claim 29 wherein the scoring step comprises the step of determining the score of each of the events associated with the second segment based in corresponding probabilities of said each event occurring or not occurring in all of the segments.
-
31. The method in claim 30 wherein the score determining step comprises the step of ascertaining the corresponding probabilities in response to the attribute/value pairs associated with said each event.
-
32. The method in claim 29 wherein the scoring step comprises the step of determining the score of said each of the events in the second segment through use of discriminant values.
-
33. The method in claim 29 wherein the indication is graphical.
-
34. The method in claim 33 wherein each of the records reflects a user who visits a predefined web site with the attributes in the record reflecting information regarding a transaction in which the user has engaged with the web site or characteristic information, regarding the user, which the user has furnished to the web site.
-
35. The method in claim 33 further comprising the steps of:
-
generating a graphical user interface on the display; and
selectively expanding or contracting the displayed hierarchy based on input commands based on user input from an individual interacting with the apparatus through the graphical user interface.
-
-
36. The method in claim 35 further comprising the step of limiting a depth of the hierarchy to a predefined level.
-
37. A computer readable medium having computer executable instructions stored therein, said instructions being executed by a computer, for performing the steps in claim 19.
-
38. Apparatus for providing a visualized hierarchical display of categorized event data, said data being a collection of records, wherein each record is associated with an occurrence of a corresponding event and comprises a plurality of attribute/value pairs characterizing the event or an individual user associated with the event, the apparatus comprising:
-
a processor;
a memory connected to the processor and storing computer executable instructions therein;
circuitry, connected to the processor, for accessing a plurality of data records, residing on a data storage medium, that collectively forms a dataset representing a collection of events and for applying the data records to the processor; and
a display operative in conjunction with the processor;
wherein the processor, in response to execution of the stored instructions;
automatically classifies the data records, based on the attribute/value pairs associated with each such record, into a plurality of mutually exclusive clusters;
determines a measure of similarity between each pair of said clusters so as to yield a plurality of similarity measures for the first clusters representing the dataset; and
visually renders each one of said pairs of clusters on the display along with a visual indication of a corresponding one of the similarity measures which is associated with said each pair of said clusters. - View Dependent Claims (39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52)
establishes a similarity threshold; and
displays the indication of the determined similarity measure for said each pair of clusters if the determined similarity measure exceeds the similarity threshold.
-
-
40. The apparatus in claim 38 wherein the visual indication comprises thickness of a displayed arc that connects the first and second clusters, a color of the arc or other visual characteristic of the arc.
-
41. The apparatus in claim 40 wherein the processor, in response to execution of the stored instructions:
-
establishes a similarity threshold; and
displays the indication of the determined similarity measure for said each pair of clusters if the determined similarity measure exceeds the similarity threshold.
-
-
42. The apparatus in claim 38 wherein the processor, in response to execution of the stored instructions:
-
receives an instruction to de-emphasize a particular cluster; and
in response to the instruction to de-emphasize a cluster, de-emphasizes the visual indication for the particular cluster.
-
-
43. The apparatus in claim 38 wherein the processor, in response to execution of the stored instructions, receives a user-specified level for the similarity threshold.
-
44. The apparatus in claim 43 wherein the processor, in response to execution of the stored instructions, displays a slider through which the user can set the similarity threshold.
-
45. The apparatus in claim 44 wherein the processor, in response to execution of the stored instructions, displays the slider either horizontally or vertically.
-
46. The apparatus in claim 43 wherein the visual indication is a displayed arc that connects the first and second clusters and the processor, in response to execution of the stored instructions, displays, with the slider set to one end position, either no or a minimum number of arcs between corresponding ones of the clusters and, with the slider set to another end position, all pair-wise connections.
-
47. The apparatus in claim 43 wherein the processor, in response to execution of the stored instructions, adjusts the displayed indication of the similarity measure for said each cluster to reflect a change in the user-specified similarity threshold.
-
48. The apparatus in claim 38 wherein the hierarchical display is visually arranged as a spring model wherein apparent attraction force between said each pair of the clusters is responsive to the similarity measure for said each pair of clusters.
-
49. The apparatus in claim 38 wherein the processor, in response to execution of the stored instructions:
-
receives a user-supplied instruction to split a particular displayed cluster; and
in response to the user-supplied instruction, displays a pair of clusters for the particular displayed combined cluster.
-
-
50. The apparatus in claim 49 wherein the processor, in response to execution of the stored instructions, displays a slider wherein user movement of the slider specifies a corresponding similarity measure, for the pair of clusters, sufficient to split the particular displayed combined cluster into said pair of clusters.
-
51. The apparatus in claim 50 wherein the processor, in response to execution of the stored instructions, displays an animation of splitting the particular displayed cluster into said pair of clusters.
-
52. The apparatus in claim 49 the particular displayed cluster is a displayed cluster that resulted from a most recent combination of a pair of clusters.
-
53. A method, for use in conjunction with apparatus, for providing a visualized hierarchical display of categorized event data, said data being a collection of records, wherein each record is associated with an occurrence of a corresponding event and comprises a plurality of attribute/value pairs characterizing the event or an individual user associated with the event, the apparatus having:
- a processor;
a memory connected to the processor and storing computer executable instructions therein;
circuitry, connected to the processor, for accessing a plurality of data records, residing on a data storage medium, that collectively forms a dataset representing a collection of events and for applying the data records to the processor; and
a display operative in conjunction with the processor;
the method comprising the steps, performed by the processor, in response to execution of the stored instructions, of;automatically classifying the data records, based on the attribute/value pairs associated with each such record, into a plurality of mutually exclusive clusters;
determining a measure of similarity between each pair of said clusters so as to yield a plurality of similarity measures for the first clusters representing the dataset; and
visually rendering each one of said pairs of clusters on the display along with a visual indication of a corresponding one of the similarity measures which is associated with said each pair of said clusters. - View Dependent Claims (54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68)
establishing a similarity threshold; and
displaying the indication of the determined similarity measure for said each pair of clusters if the determined similarity measure exceeds the similarity threshold.
- a processor;
-
55. The method in claim 53 wherein the visual indication comprises thickness of a displayed arc that connects the first and second clusters, a color of the arc or other visual characteristic of the arc.
-
56. The method in claim 55 further comprising the steps of:
-
establishing a similarity threshold; and
displaying the indication of the determined similarity measure for said each pair of clusters if the determined similarity measure exceeds the similarity threshold.
-
-
57. The method in claim 53 further comprising the steps of:
-
receiving an instruction to de-emphasize a particular cluster; and
in response to the instruction to de-emphasize a cluster, de-emphasizing the visual indication for the particular cluster.
-
-
58. The method in claim 53 further comprising the step of receiving a user-specified level for the similarity threshold.
-
59. The method in claim 58 further comprising the step of displaying a slider through which the user can set the similarity threshold.
-
60. The method in claim 59, wherein the visual indication is a displayed arc that connects the first and second clusters, comprising the step of displaying the stored instructions, with the slider set to one end position, either no or a minimum number of arcs between corresponding ones of the clusters and, with the slider set to another end position, all pair-wise connections.
-
61. The method in claim 59 further comprising the step of displaying the slider either horizontally or vertically.
-
62. The method in claim 58 further comprising the step of adjusting the displayed indication of the similarity measure for said each cluster to reflect a change in the user-specified similarity threshold.
-
63. The method in claim 53 further comprising the step of visually arranging the hierarchical display arcs as a spring model wherein apparent attractive force between said each pair of the clusters is responsive to the similarity measure for said each pair of clusters.
-
64. The method in claim 53 further comprising the steps of:
-
receiving a user-supplied instruction to split a particular displayed cluster; and
in response to the user-supplied instruction, displaying a pair of clusters for the particular displayed combined cluster.
-
-
65. The method in claim 64 further comprising the step of displaying a slider wherein user movement of the slider specifies a corresponding similarity measure, for the pair of clusters, sufficient to split the particular displayed combined cluster into said pair of clusters.
-
66. The method in claim 65 further comprising the step of displaying an animation of splitting the particular displayed cluster into said pair of clusters.
-
67. The method in claim 64 wherein the particular displayed cluster is a displayed cluster that resulted from a most recent combination of a pair of clusters.
-
68. A computer readable medium having computer executable instructions stored therein, said instructions being executed by a computer, for performing the steps in claim 53.
Specification