SCALABLE TOPOLOGICAL SUMMARY CONSTRUCTION USING LANDMARK POINT SELECTION
First Claim
1. A method comprising:
- receiving a large number of data points;
determining at least one size of a plurality of subsets of the large number of data points based on constraints of at least one of a plurality of computation devices or an analysis server, each data point of the large number of data points being a member of at least one of the plurality of subsets of the large number of data points;
transferring each of the plurality of subsets of large number of data points to a respective one of the plurality of computation devices;
for each of the plurality of subsets of data points by an associated computation device of the plurality of computation devices;
selecting, by the associated computation device, a group of data points from the subset of data points to generate a first sub-subset of landmarks;
adding, by the associated computation device, a non-landmark data point of the subset of data points to the first sub-subset of landmarks to create an expanded sub-subset of landmarks, adding the non-landmark data points comprising;
calculating first data point distances between each non-landmark data point and each landmark;
identifying a shortest data point distance from among the first data point distances for each non-landmark data point;
identifying a particular non-landmark data point with a longest first landmark distance of all the shortest data path distances; and
adding the particular non-landmark data point to the first sub-subset of landmarks to expand the first sub-subset of landmarks to generate an expanded set of landmarksuntil the expanded sub-subset of the expanded landmarks reaches a predetermined number of members, repeat adding the non-landmark data points;
creating an analysis landmark set based on a combination of expanded sub-subsets of expanded landmarks;
performing a similarity function on the analysis landmark set to map landmark points of the analysis landmark set to a mathematical reference space;
generating a cover of the mathematical reference space to divide the mathematical reference space into overlapping subsets;
clustering the mapped landmark points of the analysis landmark set based on the overlapping subsets of the cover in the mathematical reference space;
creating a plurality of nodes, each of the plurality of nodes being based on the clustering of the mapped landmark points of the analysis landmark set, each landmark point of the analysis landmark set being a member of at least one node; and
connecting at least two of the plurality of nodes with an edge if the at least two of the plurality of nodes share at least one landmark point of the analysis landmark set as a member.
5 Assignments
0 Petitions
Accused Products
Abstract
An example method comprises receiving data points, determining at least one size of a plurality of subsets based on a constraint of at least one computation device or an analysis server, transferring each of the subsets to different computation devices, each computation device selecting a group of data points to generate a first sub-subset of landmarks, add non-landmark data points that have the farthest distance to the closest landmark to create an expanded sub-subset of landmarks, create an analysis landmark set based on a combination of expanded sub-subsets of expanded landmarks from different computation devices, perform a similarity function on the analysis landmark set, generate a cover of the mathematical reference space to create overlapping subsets, cluster the mapped landmark points based on the overlapping subsets, create a plurality of nodes, each node being based on the clustering, each landmark point being a member of at least one node.
-
Citations
19 Claims
-
1. A method comprising:
-
receiving a large number of data points; determining at least one size of a plurality of subsets of the large number of data points based on constraints of at least one of a plurality of computation devices or an analysis server, each data point of the large number of data points being a member of at least one of the plurality of subsets of the large number of data points; transferring each of the plurality of subsets of large number of data points to a respective one of the plurality of computation devices; for each of the plurality of subsets of data points by an associated computation device of the plurality of computation devices; selecting, by the associated computation device, a group of data points from the subset of data points to generate a first sub-subset of landmarks; adding, by the associated computation device, a non-landmark data point of the subset of data points to the first sub-subset of landmarks to create an expanded sub-subset of landmarks, adding the non-landmark data points comprising; calculating first data point distances between each non-landmark data point and each landmark; identifying a shortest data point distance from among the first data point distances for each non-landmark data point; identifying a particular non-landmark data point with a longest first landmark distance of all the shortest data path distances; and adding the particular non-landmark data point to the first sub-subset of landmarks to expand the first sub-subset of landmarks to generate an expanded set of landmarks until the expanded sub-subset of the expanded landmarks reaches a predetermined number of members, repeat adding the non-landmark data points; creating an analysis landmark set based on a combination of expanded sub-subsets of expanded landmarks; performing a similarity function on the analysis landmark set to map landmark points of the analysis landmark set to a mathematical reference space; generating a cover of the mathematical reference space to divide the mathematical reference space into overlapping subsets; clustering the mapped landmark points of the analysis landmark set based on the overlapping subsets of the cover in the mathematical reference space; creating a plurality of nodes, each of the plurality of nodes being based on the clustering of the mapped landmark points of the analysis landmark set, each landmark point of the analysis landmark set being a member of at least one node; and connecting at least two of the plurality of nodes with an edge if the at least two of the plurality of nodes share at least one landmark point of the analysis landmark set as a member. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A non-transitory computer readable medium comprising instructions executable by a processor to perform a method, the method comprising:
-
receiving a large number of data points; transferring each of the plurality of subsets of large number of data points to a respective one of the plurality of computation devices, each of an associated computation device of the plurality of computation devices; selecting, by the associated computation device, a group of data points from the subset of data points to generate a first sub-subset of landmarks; adding, by the associated computation device, a non-landmark data point of the subset of data points to the first sub-subset of landmarks to create an expanded sub-subset of landmarks, adding the non-landmark data points comprising; calculating first data point distances between each non-landmark data point and each landmark; identifying a shortest data point distance from among the first data point distances for each non-landmark data point; identifying a particular non-landmark data point with a longest first landmark distance of all the shortest data path distances; and adding the particular non-landmark data point to the first sub-subset of landmarks to expand the first sub-subset of landmarks to generate an expanded set of landmarks until the expanded sub-subset of the expanded landmarks reaches a predetermined number of members, repeat adding the non-landmark data points; creating an analysis landmark set based on a combination of expanded sub-subsets of expanded landmarks; performing a similarity function on the analysis landmark set to map landmark points of the analysis landmark set to a mathematical reference space; generating a cover of the mathematical reference space to divide the mathematical reference space into overlapping subsets; clustering the mapped landmark points of the analysis landmark set based on the overlapping subsets of the cover in the mathematical reference space; creating a plurality of nodes, each of the plurality of nodes being based on the clustering of the mapped landmark points of the analysis landmark set, each landmark point of the analysis landmark set being a member of at least one node; and connecting at least two of the plurality of nodes with an edge if the at least two of the plurality of nodes share at least one landmark point of the analysis landmark set as a member. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system comprising:
-
at least one processor; and memory configured to contain instructions to control the processor to; receive a large number of data points; determine at least one size of a plurality of subsets of the large number of data points based on constraints of at least one of a plurality of computation devices or an analysis server, each data point of the large number of data points being a member of at least one of the plurality of subsets of the large number of data points; transfer each of the plurality of subsets of large number of data points to a respective one of the plurality of computation devices to enable for each of the plurality of subsets of data points by an associated computation device of the plurality of computation devices to; select, by the associated computation device, a group of data points from the subset of data points to generate a first sub-subset of landmarks; add, by the associated computation device, a non-landmark data point of the subset of data points to the first sub-subset of landmarks to create an expanded sub-subset of landmarks, adding the non-landmark data points comprising; calculate first data point distances between each non-landmark data point and each Landmark; identify a shortest data point distance from among the first data point distances for each non-landmark data point; identify a particular non-landmark data point with a longest first landmark distance of all the shortest data path distances; and add the particular non-landmark data point to the first sub-subset of landmarks to expand the first sub-subset of landmarks to generate an expanded set of landmarks until the expanded sub-subset of the expanded landmarks reaches a predetermined number of members, repeat adding the non-landmark data points; create an analysis landmark set based on a combination of expanded sub-subsets of expanded landmarks; perform a similarity function on the analysis landmark set to map landmark points of the analysis landmark set to a mathematical reference space; generate a cover of the mathematical reference space to divide the mathematical reference space into overlapping subsets; cluster the mapped landmark points of the analysis landmark set based on the overlapping subsets of the cover in the mathematical reference space; create a plurality of nodes, each of the plurality of nodes being based on the clustering of the mapped landmark points of the analysis landmark set, each landmark point of the analysis landmark set being a member of at least one node; and connect at least two of the plurality of nodes with an edge if the at least two of the plurality of nodes share at least one landmark point of the analysis landmark set as a member.
-
Specification