SCALABLE TOPOLOGICAL SUMMARY CONSTRUCTION USING LANDMARK POINT SELECTION

US 20160246871A1
Filed: 05/05/2016
Published: 08/25/2016
Est. Priority Date: 03/05/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving a large number of data points;

determining at least one size of a plurality of subsets of the large number of data points based on constraints of at least one of a plurality of computation devices or an analysis server, each data point of the large number of data points being a member of at least one of the plurality of subsets of the large number of data points;

transferring each of the plurality of subsets of large number of data points to a respective one of the plurality of computation devices;

for each of the plurality of subsets of data points by an associated computation device of the plurality of computation devices;

selecting, by the associated computation device, a group of data points from the subset of data points to generate a first sub-subset of landmarks;

adding, by the associated computation device, a non-landmark data point of the subset of data points to the first sub-subset of landmarks to create an expanded sub-subset of landmarks, adding the non-landmark data points comprising;

calculating first data point distances between each non-landmark data point and each landmark;

identifying a shortest data point distance from among the first data point distances for each non-landmark data point;

identifying a particular non-landmark data point with a longest first landmark distance of all the shortest data path distances; and

adding the particular non-landmark data point to the first sub-subset of landmarks to expand the first sub-subset of landmarks to generate an expanded set of landmarksuntil the expanded sub-subset of the expanded landmarks reaches a predetermined number of members, repeat adding the non-landmark data points;

creating an analysis landmark set based on a combination of expanded sub-subsets of expanded landmarks;

performing a similarity function on the analysis landmark set to map landmark points of the analysis landmark set to a mathematical reference space;

generating a cover of the mathematical reference space to divide the mathematical reference space into overlapping subsets;

clustering the mapped landmark points of the analysis landmark set based on the overlapping subsets of the cover in the mathematical reference space;

creating a plurality of nodes, each of the plurality of nodes being based on the clustering of the mapped landmark points of the analysis landmark set, each landmark point of the analysis landmark set being a member of at least one node; and

connecting at least two of the plurality of nodes with an edge if the at least two of the plurality of nodes share at least one landmark point of the analysis landmark set as a member.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An example method comprises receiving data points, determining at least one size of a plurality of subsets based on a constraint of at least one computation device or an analysis server, transferring each of the subsets to different computation devices, each computation device selecting a group of data points to generate a first sub-subset of landmarks, add non-landmark data points that have the farthest distance to the closest landmark to create an expanded sub-subset of landmarks, create an analysis landmark set based on a combination of expanded sub-subsets of expanded landmarks from different computation devices, perform a similarity function on the analysis landmark set, generate a cover of the mathematical reference space to create overlapping subsets, cluster the mapped landmark points based on the overlapping subsets, create a plurality of nodes, each node being based on the clustering, each landmark point being a member of at least one node.

Citations

19 Claims

1. A method comprising:
- receiving a large number of data points;
  
  determining at least one size of a plurality of subsets of the large number of data points based on constraints of at least one of a plurality of computation devices or an analysis server, each data point of the large number of data points being a member of at least one of the plurality of subsets of the large number of data points;
  
  transferring each of the plurality of subsets of large number of data points to a respective one of the plurality of computation devices;
  
  for each of the plurality of subsets of data points by an associated computation device of the plurality of computation devices;
  
  selecting, by the associated computation device, a group of data points from the subset of data points to generate a first sub-subset of landmarks;
  
  adding, by the associated computation device, a non-landmark data point of the subset of data points to the first sub-subset of landmarks to create an expanded sub-subset of landmarks, adding the non-landmark data points comprising;
  
  calculating first data point distances between each non-landmark data point and each landmark;
  
  identifying a shortest data point distance from among the first data point distances for each non-landmark data point;
  
  identifying a particular non-landmark data point with a longest first landmark distance of all the shortest data path distances; and
  
  adding the particular non-landmark data point to the first sub-subset of landmarks to expand the first sub-subset of landmarks to generate an expanded set of landmarksuntil the expanded sub-subset of the expanded landmarks reaches a predetermined number of members, repeat adding the non-landmark data points;
  
  creating an analysis landmark set based on a combination of expanded sub-subsets of expanded landmarks;
  
  performing a similarity function on the analysis landmark set to map landmark points of the analysis landmark set to a mathematical reference space;
  
  generating a cover of the mathematical reference space to divide the mathematical reference space into overlapping subsets;
  
  clustering the mapped landmark points of the analysis landmark set based on the overlapping subsets of the cover in the mathematical reference space;
  
  creating a plurality of nodes, each of the plurality of nodes being based on the clustering of the mapped landmark points of the analysis landmark set, each landmark point of the analysis landmark set being a member of at least one node; and
  
  connecting at least two of the plurality of nodes with an edge if the at least two of the plurality of nodes share at least one landmark point of the analysis landmark set as a member.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 further comprising:
    - for each data point that is both a member of the large data set but is not a member of the analysis landmark set;
      
      determining a distance between that data point and all landmark points of the analysis landmark set;
      
      identifying a closest landmark of the analysis landmark set to that data point;
      
      identifying node that includes the closest landmark of the analysis landmark set; and
      
      adding that data point as a member of the node that includes the closest landmark of the analysis landmark set.
  - 3. The method of claim 1 further comprising:
    - for each data point that is both a member of the large data set but is not a member of the analysis landmark set;
      
      determining a distance between that data point and all landmark points of the analysis landmark set;
      
      identifying a closest landmark of the analysis landmark set to that data point;
      
      comparing a distance between the closest landmark of the analysis landmark set and that data point to a node threshold; and
      
      if the distance between the closest landmark of the analysis landmark set and that data point is greater than the node threshold, generating a new node including that data point as a member of the new node;
      
      if the distance the distance between the closest landmark of the analysis landmark set and that data point is less than the node threshold, adding that data point as a member of the node that includes the closest landmark of the analysis landmark set.
  - 4. The method of claim 1 further comprising:
    - for each data point that is both a member of the large data set but is not a member of the analysis landmark set;
      
      determining a distance between that data point and all landmark points of the analysis landmark set;
      
      identifying a predetermined number of closest landmark of the analysis landmark set to that data point;
      
      identifying a node which includes a majority of the predetermined number of closest landmarks of the analysis landmark set as members; and
      
      adding that data point as a member of the node that includes a majority of the predetermined number of closest landmarks of the analysis landmark set as members.
  - 5. The method of claim 1 further comprising generating a visualization of the plurality of nodes and edge.
  - 6. The method of claim 2 further comprising generating a visualization of the plurality of nodes and edge.
  - 7. The method of claim 1, further comprising determining the predetermined number of members of the expanded sub-subset of the expanded landmarks based on the constraints of the at least one of a plurality of computation devices or an analysis server.
  - 8. The method of claim 7, wherein the determination of the predetermined number of members of the expanded sub-subset of the expanded landmarks is based, at least in part, on a determination of a predetermined number of members of the analysis landmark set.
  - 9. The method of claim 1, wherein selecting, by the associated computation device, the group of data points from the subset of data points to generate the first sub-subset of landmarks is performed randomly.

10. A non-transitory computer readable medium comprising instructions executable by a processor to perform a method, the method comprising:
- receiving a large number of data points;
  
  transferring each of the plurality of subsets of large number of data points to a respective one of the plurality of computation devices, each of an associated computation device of the plurality of computation devices;
  
  selecting, by the associated computation device, a group of data points from the subset of data points to generate a first sub-subset of landmarks;
  
  adding, by the associated computation device, a non-landmark data point of the subset of data points to the first sub-subset of landmarks to create an expanded sub-subset of landmarks, adding the non-landmark data points comprising;
  
  calculating first data point distances between each non-landmark data point and each landmark;
  
  identifying a shortest data point distance from among the first data point distances for each non-landmark data point;
  
  identifying a particular non-landmark data point with a longest first landmark distance of all the shortest data path distances; and
  
  adding the particular non-landmark data point to the first sub-subset of landmarks to expand the first sub-subset of landmarks to generate an expanded set of landmarksuntil the expanded sub-subset of the expanded landmarks reaches a predetermined number of members, repeat adding the non-landmark data points;
  
  creating an analysis landmark set based on a combination of expanded sub-subsets of expanded landmarks;
  
  performing a similarity function on the analysis landmark set to map landmark points of the analysis landmark set to a mathematical reference space;
  
  generating a cover of the mathematical reference space to divide the mathematical reference space into overlapping subsets;
  
  clustering the mapped landmark points of the analysis landmark set based on the overlapping subsets of the cover in the mathematical reference space;
  
  creating a plurality of nodes, each of the plurality of nodes being based on the clustering of the mapped landmark points of the analysis landmark set, each landmark point of the analysis landmark set being a member of at least one node; and
  
  connecting at least two of the plurality of nodes with an edge if the at least two of the plurality of nodes share at least one landmark point of the analysis landmark set as a member.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The non-transitory computer readable medium of claim 10 further comprising:
    - for each data point that is both a member of the large data set but is not a member of the analysis landmark set;
      
      determining a distance between that data point and all landmark points of the analysis landmark set;
      
      identifying a closest landmark of the analysis landmark set to that data point;
      
      identifying node that includes the closest landmark of the analysis landmark set; and
      
      adding that data point as a member of the node that includes the closest landmark of the analysis landmark set.
  - 12. The non-transitory computer readable medium of claim 10 further comprising:
    - for each data point that is both a member of the large data set but is not a member of the analysis landmark set;
      
      determining a distance between that data point and all landmark points of the analysis landmark set;
      
      identifying a closest landmark of the analysis landmark set to that data point;
      
      comparing a distance between the closest landmark of the analysis landmark set and that data point to a node threshold; and
      
      if the distance between the closest landmark of the analysis landmark set and that data point is greater than the node threshold, generating a new node including that data point as a member of the new node;
      
      if the distance the distance between the closest landmark of the analysis landmark set and that data point is less than the node threshold, adding that data point as a member of the node that includes the closest landmark of the analysis landmark set.
  - 13. The non-transitory computer readable medium of claim 10 further comprising:
    - for each data point that is both a member of the large data set but is not a member of the analysis landmark set;
      
      determining a distance between that data point and all landmark points of the analysis landmark set;
      
      identifying a predetermined number of closest landmark of the analysis landmark set to that data point;
      
      identifying a node which includes a majority of the predetermined number of closest landmarks of the analysis landmark set as members; and
      
      adding that data point as a member of the node that includes a majority of the predetermined number of closest landmarks of the analysis landmark set as members.
  - 14. The non-transitory computer readable medium of claim 10 further comprising generating a visualization of the plurality of nodes and edge.
  - 15. The non-transitory computer readable medium of claim 10 further comprising generating a visualization of the plurality of nodes and edge.
  - 16. The non-transitory computer readable medium of claim 10 further comprising determining the predetermined number of members of the expanded sub-subset of the expanded landmarks based on the constraints of the at least one of a plurality of computation devices or an analysis server.
  - 17. The non-transitory computer readable medium of claim 16, wherein the determination of the predetermined number of members of the expanded sub-subset of the expanded landmarks is based, at least in part, on a determination of a predetermined number of members of the analysis landmark set.
  - 18. The non-transitory computer readable medium of claim 10, wherein selecting, by the associated computation device, the group of data points from the subset of data points to generate the first sub-subset of landmarks is performed randomly.

19. A system comprising:
- at least one processor; and
  
  memory configured to contain instructions to control the processor to;
  
  receive a large number of data points;
  
  determine at least one size of a plurality of subsets of the large number of data points based on constraints of at least one of a plurality of computation devices or an analysis server, each data point of the large number of data points being a member of at least one of the plurality of subsets of the large number of data points;
  
  transfer each of the plurality of subsets of large number of data points to a respective one of the plurality of computation devices to enable for each of the plurality of subsets of data points by an associated computation device of the plurality of computation devices to;
  
  select, by the associated computation device, a group of data points from the subset of data points to generate a first sub-subset of landmarks;
  
  add, by the associated computation device, a non-landmark data point of the subset of data points to the first sub-subset of landmarks to create an expanded sub-subset of landmarks, adding the non-landmark data points comprising;
  
  calculate first data point distances between each non-landmark data point and each Landmark;
  
  identify a shortest data point distance from among the first data point distances for each non-landmark data point;
  
  identify a particular non-landmark data point with a longest first landmark distance of all the shortest data path distances; and
  
  add the particular non-landmark data point to the first sub-subset of landmarks to expand the first sub-subset of landmarks to generate an expanded set of landmarksuntil the expanded sub-subset of the expanded landmarks reaches a predetermined number of members, repeat adding the non-landmark data points;
  
  create an analysis landmark set based on a combination of expanded sub-subsets of expanded landmarks;
  
  perform a similarity function on the analysis landmark set to map landmark points of the analysis landmark set to a mathematical reference space;
  
  generate a cover of the mathematical reference space to divide the mathematical reference space into overlapping subsets;
  
  cluster the mapped landmark points of the analysis landmark set based on the overlapping subsets of the cover in the mathematical reference space;
  
  create a plurality of nodes, each of the plurality of nodes being based on the clustering of the mapped landmark points of the analysis landmark set, each landmark point of the analysis landmark set being a member of at least one node; and
  
  connect at least two of the plurality of nodes with an edge if the at least two of the plurality of nodes share at least one landmark point of the analysis landmark set as a member.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SymphonyAI Sensa LLC (Fortive Corp.)
Original Assignee
Ayasdi, Inc. (Fortive Corp.)
Inventors
Hsu, Ryan, Singh, Gurjeet, Spracklen, Lawrence

Granted Patent

US 10,216,828 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/285   Clustering or classification

G06F 16/29   Geographical information da...

G06F 16/9024   Graphs; Linked lists G06F16...

G06Q 10/08355   Routing methods

G16B 40/00   ICT specially adapted for b...

G16B 5/00   ICT specially adapted for m...

SCALABLE TOPOLOGICAL SUMMARY CONSTRUCTION USING LANDMARK POINT SELECTION

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

SCALABLE TOPOLOGICAL SUMMARY CONSTRUCTION USING LANDMARK POINT SELECTION

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links