×

System for clustering and aggregating data from multiple sources

  • US 10,026,114 B2
  • Filed: 01/12/2015
  • Issued: 07/17/2018
  • Est. Priority Date: 01/10/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method of aggregating entity data from a plurality of sources, the method comprising:

  • obtaining sample data from a plurality of data sources, the sample data corresponding to a plurality of entities that have not been previously identified, wherein samples from multiple data sources correspond to a same entity;

    processing the samples to identify a plurality of fields corresponding to each sample;

    identifying a first partial cluster of the samples based on a first set of rules that relies on a geographical indicator, the first partial cluster comprising a first subset of fields from a first sample for comparison with a second sample on the first subset of fields to determine whether the samples correspond with a same entity, wherein identifying the first partial cluster includes;

    determining whether the second sample is in the first partial cluster by;

    determining a first field distance between a first field of the first subset of fields of the first sample and a first field of the first subset of fields of the second sample;

    calculating a first metric based on the first field distance; and

    adding the second sample to the first metric when the first metric is within a first threshold;

    identifying a second partial cluster of the samples based on a second set of rules that relies on a user identifier, the second partial cluster comprising a second subset of fields from the first sample for comparison with the second sample on the second subset of fields to determine whether the samples correspond with the same entity, wherein identifying the second partial cluster includes;

    determining whether the second sample is in the second partial cluster by;

    determining a second field distance between a second field of the second subset of fields of the first sample and a second field of the second subset of fields of the second sample;

    calculating a second metric based on the second field distance; and

    adding the second sample to the second metric when the second metric is within a second threshold;

    initiating an aggregation process that determines that the second partial cluster of the samples corresponds to the same entity as the first partial cluster of the samples, wherein the aggregation process generates a full entity cluster that corresponds to the same entity having the user identifier for the first field and the geographical indicator for the second field; and

    storing the user identifier and the geographical indicator of the full entity cluster into a first record of a database.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×