Grouping of data points in data analysis for graph generation

US 10,599,669 B2
Filed: 03/14/2016
Issued: 03/24/2020
Est. Priority Date: 01/14/2014
Status: Active Grant

First Claim

Patent Images

1. A non-transitory computer readable medium including executable instructions, the instructions being executable by a processor to perform a method, the method comprising:

receiving input data;

performing a similarity function on the input data to map the input data into a reference space to create reference data in the reference space, wherein the similarity function includes a distance function;

identifying groupings of the reference data in the reference space using a resolution function;

identifying nodes using a metric of the input data associated with groupings of the reference data, each node including at least some of the input data;

building a first partition of subsets of the input data by hierarchical clustering creating a set of data trees, each subset of the first partition containing one or more nodes that are exclusive of other subsets of the first partition;

computing a first subset score for each subset of the first partition using a scoring function;

identifying a next partition from the hierarchical clustering including all of the nodes of the first partition, the next partition including at least one subset that includes all of the nodes of two or more subsets of the first partition, each particular subset of the next partition being related to one or more subsets of a previously generated partition if that particular subset shares membership of at least one node with the one or more subsets of the previously generated partition;

computing a second subset score for each subset of the next partition using the scoring function;

defining a max score for each particular subset of the next partition using a max score function, each max score being based on maximal subset scores of that particular subset of the next partition and at least the subsets of the first partition related to that particular subset;

selecting output subsets from all subsets of the next partition and the previously generated partitions including the first partition, the output subsets together including all elements of the first partition, selection of each of the output subsets being made, at least in part, using a maximum score of previously computed subset scores, the maximum score being a largest score of all subset scores of the next partition and previously generated partitions including the first partition; and

generating a visualization report including graphical objects indicating an output partition containing the output subsets, the output subsets of the output partition being associated with the nodes, each subset of the output partition containing nodes being exclusive of other subsets of the output partition.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Autogrouping is described. An example method includes receiving a data set, building a first partition of subsets of the data set, computing a first subset score for each subset using a scoring function, generating a next partition including at least one subset that includes the elements of two or more subsets of the first partition, computing a second subset score for each subset of the next partition using the scoring function, defining a max score for each particular subset using a max score function, each max score being based on maximal subset scores of that particular subset and at least the subsets of the first partition related to that particular subset, selecting output subsets, selection of each of the output subsets being made using a maximum score of previously computed subset scores, and generating a report indicating an output partition, the output subsets being associated with the received data set.

Citations

22 Claims

1. A non-transitory computer readable medium including executable instructions, the instructions being executable by a processor to perform a method, the method comprising:
- receiving input data;
  
  performing a similarity function on the input data to map the input data into a reference space to create reference data in the reference space, wherein the similarity function includes a distance function;
  
  identifying groupings of the reference data in the reference space using a resolution function;
  
  identifying nodes using a metric of the input data associated with groupings of the reference data, each node including at least some of the input data;
  
  building a first partition of subsets of the input data by hierarchical clustering creating a set of data trees, each subset of the first partition containing one or more nodes that are exclusive of other subsets of the first partition;
  
  computing a first subset score for each subset of the first partition using a scoring function;
  
  identifying a next partition from the hierarchical clustering including all of the nodes of the first partition, the next partition including at least one subset that includes all of the nodes of two or more subsets of the first partition, each particular subset of the next partition being related to one or more subsets of a previously generated partition if that particular subset shares membership of at least one node with the one or more subsets of the previously generated partition;
  
  computing a second subset score for each subset of the next partition using the scoring function;
  
  defining a max score for each particular subset of the next partition using a max score function, each max score being based on maximal subset scores of that particular subset of the next partition and at least the subsets of the first partition related to that particular subset;
  
  selecting output subsets from all subsets of the next partition and the previously generated partitions including the first partition, the output subsets together including all elements of the first partition, selection of each of the output subsets being made, at least in part, using a maximum score of previously computed subset scores, the maximum score being a largest score of all subset scores of the next partition and previously generated partitions including the first partition; and
  
  generating a visualization report including graphical objects indicating an output partition containing the output subsets, the output subsets of the output partition being associated with the nodes, each subset of the output partition containing nodes being exclusive of other subsets of the output partition.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The non-transitory computer readable medium of claim 1, wherein the scoring function is a modularity function.
  - 3. The non-transitory computer readable medium of claim 1, wherein the data input includes data points.
  - 4. The non-transitory computer readable medium of claim 1, wherein the similarity function is a density function.
  - 5. The non-transitory computer readable medium of claim 1, wherein the output subsets of the report identify particular data of the input data associated with elements of a particular output subset.
  - 6. The non-transitory computer readable medium of claim 1, wherein the max score function compares a second subset score of a particular subset of the next partition to a sum of subset scores of related subsets from the first partition to determine a maximum.
  - 7. The non-transitory computer readable medium of claim 1, wherein each subset of the first partition include a single element.
  - 8. The non-transitory computer readable medium of claim 1, the method further comprising generating a second partition is after the first partition is generated, the second partition including all of the elements of the first partition, the second partition including at least one subset that includes the elements of two or more subsets of the first partition, each particular subset of the second partition being related to one or more subsets of the first partition if that particular subset shares membership of at least one element within the one or more subsets of the first partition.
  - 9. The non-transitory computer readable medium of claim 8, wherein the next partition includes at least one subset that includes the elements of two or more subsets of the second partition, each particular subset of the next partition being related to one or more subsets of the second partition if that particular subset shares membership of at least one element within the one or more subsets of the second partition.
  - 10. The non-transitory computer readable medium of claim 1, wherein a set of less than all possible partitions of the data set are generated.
  - 11. The non-transitory computer readable medium of claim 1 wherein at least one partition from the hierarchical clustering of the data set includes an atomic forest where every leaf in the atomic forest includes only one node.

12. A method comprising:
- receiving input data;
  
  performing a similarity function on the input data to map the input data into a reference space to create reference data in the reference space, wherein the similarity function includes a distance function;
  
  identifying groupings of the reference data in the reference space using a resolution function;
  
  identifying nodes using a metric of the input data associated with groupings of the reference data, each node including at least some of the input data;
  
  building a first partition of subsets of the input data by hierarchical clustering creating a set of data trees, each subset of the first partition containing one or more nodes that are exclusive of other subsets of the first partition;
  
  computing a first subset score for each subset of the first partition using a scoring function;
  
  identifying a next partition from the hierarchical clustering including all of the nodes of the first partition, the next partition including at least one subset that includes all of the nodes of two or more subsets of the first partition, each particular subset of the next partition being related to one or more subsets of a previously generated partition if that particular subset shares membership of at least one node with the one or more subsets of the previously generated partition;
  
  computing a second subset score for each subset of the next partition using the scoring function;
  
  defining a max score for each particular subset of the next partition using a max score function, each max score being based on maximal subset scores of that particular subset of the next partition and at least the subsets of the first partition related to that particular subset;
  
  selecting output subsets from all subsets of the next partition and the previously generated partitions including the first partition, the output subsets together including all elements of the first partition, selection of each of the output subsets being made, at least in part, using a maximum score of previously computed subset scores, the maximum score being a largest score of all subset scores of the next partition and previously generated partitions including the first partition; and
  
  generating a visualization report including graphical objects indicating an output partition containing the output subsets, the output subsets of the output partition being associated with the nodes, each subset of the output partition containing nodes being exclusive of other subsets of the output partition.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The method of claim 12, wherein the scoring function is a modularity function.
  - 14. The method of claim 12, wherein the data set includes data points.
  - 15. The method of claim 12, wherein the similarity function is a density function.
  - 16. The method of claim 12, wherein the output subsets of the report identify particular data of the input data associated with elements of a particular output subset.
  - 17. The method of claim 12, wherein the max score function compares a second subset score of a particular subset of the next partition to a sum of subset scores of related subsets from the first partition to determine a maximum.
  - 18. The method of claim 12, wherein each subset of the first partition include a single element.
  - 19. The method of claim 12, further comprising generating a second partition after the first partition is generated, the second partition including all of the elements of the first partition, the second partition including at least one subset that includes the elements of two or more subsets of the first partition, each particular subset of the second partition being related to one or more subsets of the first partition if that particular subset shares membership of at least one element within the one or more subsets of the first partition.
  - 20. The method of claim 19, wherein the next partition includes at least one subset that includes the elements of two or more subsets of the second partition, each particular subset of the next partition being related to one or more subsets of the second partition if that particular subset shares membership of at least one element within the one or more subsets of the second partition.
  - 21. The method of claim 12, wherein a set of less than all possible partitions of the data set are generated.

22. A system comprising:
- a processor; and
  
  memory storing instructions that, when executed by the processor, cause the processor to;
  
  receive input data;
  
  perform a similarity function on the input data to map the input data into a reference space to create reference data in the reference space, wherein the similarity function includes a distance function;
  
  identify groupings of the reference data in the reference space using a resolution function;
  
  identifying nodes using a metric of the input data associated with groupings of the reference data, each node including at least some of the input data;
  
  build a first partition of subsets of the input data by hierarchical clustering creating a set of data trees, each subset of the first partition containing one or more nodes that are exclusive of other subsets of the first partition;
  
  compute a first subset score for each subset of the first partition using a scoring function;
  
  identify a next partition from the hierarchical clustering including all of the nodes of the first partition, the next partition including at least one subset that includes all of the nodes of two or more subsets of the first partition, each particular subset of the next partition being related to one or more subsets of a previously generated partition if that particular subset shares membership of at least one node with the one or more subsets of the previously generated partition;
  
  compute a second subset score for each subset of the next partition using the scoring function;
  
  define a max score for each particular subset of the next partition using a max score function, each max score being based on maximal subset scores of that particular subset of the next partition and at least the subsets of the first partition related to that particular subset;
  
  select output subsets from all subsets of the next partition and the previously generated partitions including the first partition, the output subsets together including all elements of the first partition, selection of each of the output subsets being made, at least in part, using a maximum score of previously computed subset scores, the maximum score being a largest score of all subset scores of the next partition and previously generated partitions including the first partition; and
  
  generate visualization report including graphical objects indicating an output partition containing the output subsets, the output subsets of the output partition being associated with the nodes, each subset of the output partition containing nodes being exclusive of other subsets of the output partition.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SymphonyAI Sensa LLC (Fortive Corp.)
Original Assignee
Ayasdi Al LLC
Inventors
Sexton, Harlan
Primary Examiner(s)
Le, Miranda

Application Number

US15/069,797
Publication Number

US 20160246863A1
Time in Patent Office

1,471 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/00   Information retrieval; Data...

G06F 16/24578   using ranking

G06F 16/26   Visual data mining; Browsin...

G06F 16/285   Clustering or classification

G06F 16/287   Visualization; Browsing

G06F 16/35   Clustering; Classification

G06F 16/904   Browsing; Visualisation the...

G06F 16/955   using information identifie...

G06F 17/18   for evaluating statistical ...

G06F 18/23   Clustering techniques

G06F 18/40   Software arrangements speci...

G06Q 10/00   Administration; Management

G06T 11/206   Drawing of charts or graphs

G06V 10/762   using clustering, e.g. of s...

Grouping of data points in data analysis for graph generation

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Grouping of data points in data analysis for graph generation

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links