UNSUPERVISED PRIORITIZATION AND VISUALIZATION OF CLUSTERS

US 20140143249A1
Filed: 03/14/2013
Published: 05/22/2014
Est. Priority Date: 11/19/2012
Status: Active Grant

First Claim

Patent Images

1. A network device, comprising:

a transceiver to send and receive data over a network; and

a processor that is operative to perform actions, comprising;

receiving a dataset of a plurality of attributes for a plurality of entities, each entity being described by a set of attribute values from within the plurality of attributes;

receiving a clustering of the plurality of entities, the clustering describing a plurality of clusters;

for each of a first cluster and a reference cluster, for each attribute of the first cluster, computing an aggregate attribute value;

for each of the first cluster, for each aggregate attribute value, computing an attribute dissimilarity between the aggregate attribute value of the first cluster from the aggregate attribute value of the reference cluster;

combining the attribute dissimilarities for each cluster, to obtain a single cluster dissimilarity for each cluster; and

displaying on a display device, an ordering of each cluster based on their respective cluster dissimilarities to the reference cluster.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are disclosed that automatically identify and order the most differentiated clusters from a given collection of clusters within a dataset. A measure of dissimilarity is computed for each cluster from a defined reference cluster, and the clusters are ordered according to the chosen dissimilarity. At least N clusters are selected as the most differentiated clusters relative to the defined reference. Within each cluster, the top-M most distinguishing cluster attributes can be automatically identified by an analogous process that computes the dissimilarity of each cluster attribute to its corresponding attribute in the reference cluster, and orders the attributes by dissimilarity. This then allows for automatic surfacing of what it is about a cluster that differentiates its members relative to the population as a whole, and to provide insight on what action or treatment might be made to address that specific segment of the underlying population.

Citations

20 Claims

1. A network device, comprising:
- a transceiver to send and receive data over a network; and
  
  a processor that is operative to perform actions, comprising;
  
  receiving a dataset of a plurality of attributes for a plurality of entities, each entity being described by a set of attribute values from within the plurality of attributes;
  
  receiving a clustering of the plurality of entities, the clustering describing a plurality of clusters;
  
  for each of a first cluster and a reference cluster, for each attribute of the first cluster, computing an aggregate attribute value;
  
  for each of the first cluster, for each aggregate attribute value, computing an attribute dissimilarity between the aggregate attribute value of the first cluster from the aggregate attribute value of the reference cluster;
  
  combining the attribute dissimilarities for each cluster, to obtain a single cluster dissimilarity for each cluster; and
  
  displaying on a display device, an ordering of each cluster based on their respective cluster dissimilarities to the reference cluster.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The network device of claim 1, wherein at least one attribute used in clustering the plurality of entities is different from the attributes used in determining an aggregate attribute value for a cluster, and wherein at least one cluster'"'"'s attributes used to compute its aggregate attribute value has at least one attribute that is different from at least one other cluster'"'"'s attributes used to compute the other cluster'"'"'s aggregate attribute value.
  - 3. The network device of claim 1, wherein the reference cluster includes at least one entity and related attributes that are not included in the data of attributes for the plurality of entities.
  - 4. The network device of claim 1, wherein at least one attribute is a vector-valued attribute.
  - 5. The network device of claim 1, wherein the attribute dissimilarities are determined using a Kullback-Leibler divergence.
  - 6. The network device of claim 1, wherein the processor that is operative to perform actions, further comprising:
    - selecting a cluster from the plurality of clusters; and
      
      displaying a subset of attributes as most differentiating attributes based on each attribute'"'"'s contribution to a measure of dissimilarity for the selected cluster, the subset of attributes indicating how entities in the selected cluster are differentiated from the plurality of entities.

7. A system, comprising:
- one or more non-transitory storage devices usable to store customer data; and
  
  one or more processors operative to perform actions, comprising;
  
  receiving a dataset of a plurality of attributes for a plurality of entities, each entity being described by a set of attribute values from within the plurality of attributes;
  
  receiving a clustering of the plurality of entities, the clustering describing a plurality of clusters;
  
  for each of a first cluster and a reference cluster, for each attribute of the first cluster, computing an aggregate attribute value;
  
  for each of the first cluster, for each aggregate attribute value, computing an attribute dissimilarity between the aggregate attribute value of the first cluster from the aggregate attribute value of the reference cluster;
  
  combining the attribute dissimilarities for each cluster, to obtain a single cluster dissimilarity for each cluster; and
  
  displaying on a display device, an ordering of each cluster based on their respective cluster dissimilarities to the reference cluster.
- View Dependent Claims (8, 9, 10, 11, 12, 13)
- - 8. The system of claim 7, wherein the one or more processors are operative to perform actions, further comprising:
    - selecting an entity from the plurality of entities; and
      
      displaying an ordering of at least a subset attributes describing the selected entity, wherein the ordering is based on each attribute'"'"'s dissimilarities to an aggregate attribute value for a selected cluster.
  - 9. The system of claim 7, wherein at least one attribute used in clustering the plurality of entities is different from the attributes used in determining an aggregate attribute value for a cluster, and wherein at least one cluster'"'"'s attributes used to compute its aggregate attribute value has at least one attribute that is different from at least one other cluster'"'"'s attributes used to compute the other cluster'"'"'s aggregate attribute value.
  - 10. The system of claim 7, wherein the reference cluster includes at least one entity and related attributes that are not included in the data of attributes for the plurality of entities.
  - 11. The system of claim 7, wherein at least one attribute is a vector-valued attribute.
  - 12. The system of claim 7, wherein the attribute dissimilarities are determined using a Kullback-Leibler divergence, a Battacharrya distance, a mean-squared error, an Lp-norm, or a Euclidean distance.
  - 13. The system of claim 7, wherein the one or more processors are operative to perform actions, further comprising:
    - selecting a cluster from the plurality of clusters; and
      
      displaying a subset of attributes as most differentiating attributes based on each attribute'"'"'s contribution to a measure of dissimilarity for the selected cluster, the subset of attributes indicating how entities in the selected cluster are differentiated from the plurality of entities.

14. An apparatus comprising a non-transitory computer readable medium, having computer-executable instructions stored thereon, that in response to execution by a computing device, cause the computing device to perform operations, comprising:
- receiving a dataset of a plurality of attributes for a plurality of entities, each entity being described by a set of attribute values from within the plurality of attributes;
  
  receiving a clustering of the plurality of entities, the clustering describing a plurality of clusters;
  
  for each of a first cluster and a reference cluster, for each attribute of the first cluster, computing an aggregate attribute value;
  
  for each of the first cluster, for each aggregate attribute value, computing an attribute dissimilarity between the aggregate attribute value of the first cluster from the aggregate attribute value of the reference cluster;
  
  combining the attribute dissimilarities for each cluster, to obtain a single cluster dissimilarity for each cluster; and
  
  displaying an ordering of each cluster based on their respective cluster dissimilarities to the reference cluster.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The apparatus of claim 14, wherein the computing device to perform operations, further comprising:
    - selecting an entity from the plurality of entities; and
      
      displaying an ordering of at least a subset attributes describing the selected entity, wherein the ordering is based on each attribute'"'"'s dissimilarities to an aggregate attribute value for a selected cluster.
  - 16. The apparatus of claim 14, wherein at least one attribute used in clustering the plurality of entities is different from the attributes used in determining an aggregate attribute value for a cluster, and wherein at least one cluster'"'"'s attributes used to compute its aggregate attribute value has at least one attribute that is different from at least one other cluster'"'"'s attributes used to compute the other cluster'"'"'s aggregate attribute value.
  - 17. The apparatus of claim 14, wherein the reference cluster includes at least one entity and related attributes that are not included in the data of attributes for the plurality of entities.
  - 18. The apparatus of claim 14, wherein the attribute dissimilarities are determined using a Kullback-Leibler divergence, a Battacharrya distance, an mean-squared error, an Lp-norm, or an Euclidean distance.
  - 19. The apparatus of claim 14, wherein at least one attribute is a vector-valued attribute.
  - 20. The apparatus of claim 14, wherein at least one attribute represents a distribution of attributes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Curinos Incorporated
Original Assignee
Globys Inc (Constellation Software Incorporated)
Inventors
Cazzanti, Luca, Mehanian, Courosh, Penzotti, Julie, Downs, Oliver, Scott, Doug

Granted Patent

US 9,659,087 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/737
CPC Class Codes

G06F 16/26   Visual data mining; Browsin...

G06F 16/35   Clustering; Classification

G06F 16/358   Browsing; Visualisation the...

UNSUPERVISED PRIORITIZATION AND VISUALIZATION OF CLUSTERS

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

UNSUPERVISED PRIORITIZATION AND VISUALIZATION OF CLUSTERS

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links