×

Systems and methods for automatic clustering and canonical designation of related data in various data structures

  • US 10,127,289 B2
  • Filed: 08/10/2016
  • Issued: 11/13/2018
  • Est. Priority Date: 08/19/2015
  • Status: Active Grant
First Claim
Patent Images

1. A system comprising:

  • a data store configured to store computer-executable instructions and a plurality of records, wherein each record of the plurality of records is associated with a respective entity and comprises one or more fields;

    a computing device including a processor in communication with the data store, the processor configured to execute the computer-executable instructions to at least;

    identify, based at least in part on a first field of the one or more fields, a first group of the plurality of records;

    determine that a distribution of sizes of groups including the first group satisfies a distribution rule;

    generate one or more record pairs from the first group, each of the one or more record pairs comprising a respective first record and second record, wherein at least one field of the first record differs from a corresponding field in the second record;

    determine, for each of the one or more record pairs, a respective match score, the respective match scores comprising probabilities that the respective first record and second record of the respective record pair are associated with a respective same entity;

    identify a plurality of clusters of record pairs, wherein each pair in each cluster has a record in common with at least one other pair in the cluster, and wherein each pair in each cluster has a respective match score above a threshold;

    determine, for each of the plurality of clusters, that a diameter of the cluster satisfies a diameter criterion;

    determine, for each of the plurality of clusters, that an entropy of the cluster satisfies an entropy criterion;

    determine, based at least in part on the distribution of sizes of groups, the respective match scores, the diameter criterion, and the entropy criterion, that each of the plurality of clusters corresponds to a respective entity;

    determine, for each of the plurality of clusters, a geographical location associated with the cluster, the geographic location corresponding to the respective entity;

    generate, based at least in part on the geographical location associated with each cluster and a number of record pairs in each cluster, a heat map for display on a client computing device, wherein the heat map enables identification of suitable locations for providing coverage of the geographical locations associated with the clusters, wherein the heat map overlays information regarding the number of record pairs in each cluster on the geographic location associated with the cluster, and wherein the heat map displays information regarding the at least one field of individual records in each cluster as a color, symbol, shading, or other representation; and

    cause the client computing device to display the heat map.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×