×

Scalable topological summary construction using landmark point selection

  • US 10,216,828 B2
  • Filed: 05/05/2016
  • Issued: 02/26/2019
  • Est. Priority Date: 03/05/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • receiving a large number of data points;

    determining at least one size of a plurality of subsets of the large number of data points based on constraints of at least one of a plurality of computation devices or an analysis server, each data point of the large number of data points being a member of at least one of the plurality of subsets of the large number of data points;

    transferring each of the plurality of subsets of the large number of data points to a respective one of the plurality of computation devices;

    for each of the plurality of subsets of data points by an associated computation device of the plurality of computation devices;

    selecting, by the associated computation device, a group of data points from the subset of data points to generate a first sub-subset of landmarks;

    adding, by the associated computation device, a non-landmark data point of the subset of data points to the first sub-subset of landmarks to create an expanded sub-subset of landmarks, adding the non-landmark data points comprising;

    calculating first data point distances between each non-landmark data point and each landmark;

    identifying a shortest data point distance from among the first data point distances for each non-landmark data point;

    identifying a particular non-landmark data point with a longest first landmark distance of all the shortest data path distances; and

    adding the particular non-landmark data point to the first sub-subset of landmarks to expand the first sub-subset of landmarks to generate an expanded set of landmarks;

    repeating the adding the non-landmark data points until the expanded sub-subset of the expanded landmarks reaches a predetermined number of members;

    creating an analysis landmark set based on a combination of the expanded sub-subsets of expanded landmarks;

    performing a similarity function on the analysis landmark set to map landmark points of the analysis landmark set to a mathematical reference space;

    generating a cover of the mathematical reference space to divide the mathematical reference space into overlapping subsets;

    clustering the mapped landmark points of the analysis landmark set based on the overlapping subsets of the cover in the mathematical reference space;

    creating a plurality of nodes, each of the plurality of nodes being based on the clustering of the mapped landmark points of the analysis landmark set, each landmark point of the analysis landmark set being a member of at least one node;

    connecting at least two of the plurality of nodes with an edge if the at least two of the plurality of nodes share at least one landmark point of the analysis landmark set as a member; and

    generating a visualization of at least a subset of the plurality of nodes, the visualization including the edge connecting the at least two of the plurality of nodes.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×