Fast algorithms and metrics for comparing hierarchical clustering information trees and numerical vectors
First Claim
Patent Images
1. A method for determining a similarity between two data sets, comprising:
- determining a first list of data clusters for a first hierarchically-organized data set;
determining a second list of data clusters for a second hierarchically-organized data set;
removing a master cluster from consideration if the first and second data sets have all common elements;
determining a similarity between the first and second data sets by calculating a maximum flow between the first list of data clusters and the second list of data clusters;
determining a maximum number of redundant elements for the first and second data sets; and
dividing the maximum number of redundant elements by the maximum matching flow to arrive at a distance metric.
1 Assignment
0 Petitions
Accused Products
Abstract
In various embodiments, a method for determining a similarity between two data sets is disclosed, the steps of which include determining a first list of data clusters for a first hierarchically-organized data set, determining a second list of data clusters for a second hierarchically-organized data set, and determining a similarity between the first and second data sets by calculating a maximum flow between the first list of data clusters and the second list of data clusters.
-
Citations
12 Claims
-
1. A method for determining a similarity between two data sets, comprising:
-
determining a first list of data clusters for a first hierarchically-organized data set; determining a second list of data clusters for a second hierarchically-organized data set; removing a master cluster from consideration if the first and second data sets have all common elements; determining a similarity between the first and second data sets by calculating a maximum flow between the first list of data clusters and the second list of data clusters; determining a maximum number of redundant elements for the first and second data sets; and dividing the maximum number of redundant elements by the maximum matching flow to arrive at a distance metric. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An electronic medium capable of being read by a computing device, the electronic medium containing instructions for:
-
determining a first list of data clusters for a first hierarchically-organized data set; determining a second list of data clusters for a second hierarchically-organized data set; removing a master cluster from consideration if the first and second data sets have all common elements; determining a similarity between the first and second data sets by calculating a maximum flow between the first list of data clusters and the second list of data clusters; determining a maximum number of redundant elements for the first and second data sets; and dividing the maximum number of redundant elements by the maximum matching flow to arrive at a distance metric. - View Dependent Claims (8, 9, 10, 11, 12)
-
Specification