System for clustering and aggregating data from multiple sources
First Claim
Patent Images
1. A method of aggregating entity data from a plurality of sources, the method comprising:
- obtaining sample data from a plurality of data sources, the sample data corresponding to a plurality of entities that have not been previously identified, wherein samples from multiple data sources correspond to a same entity;
processing the samples to identify a plurality of fields corresponding to each sample;
identifying a first partial cluster of the samples based on a first set of rules that relies on a geographical indicator, the first partial cluster comprising a first subset of fields from a first sample for comparison with a second sample on the first subset of fields to determine whether the samples correspond with a same entity, wherein identifying the first partial cluster includes;
determining whether the second sample is in the first partial cluster by;
determining a first field distance between a first field of the first subset of fields of the first sample and a first field of the first subset of fields of the second sample;
calculating a first metric based on the first field distance; and
adding the second sample to the first metric when the first metric is within a first threshold;
identifying a second partial cluster of the samples based on a second set of rules that relies on a user identifier, the second partial cluster comprising a second subset of fields from the first sample for comparison with the second sample on the second subset of fields to determine whether the samples correspond with the same entity, wherein identifying the second partial cluster includes;
determining whether the second sample is in the second partial cluster by;
determining a second field distance between a second field of the second subset of fields of the first sample and a second field of the second subset of fields of the second sample;
calculating a second metric based on the second field distance; and
adding the second sample to the second metric when the second metric is within a second threshold;
initiating an aggregation process that determines that the second partial cluster of the samples corresponds to the same entity as the first partial cluster of the samples, wherein the aggregation process generates a full entity cluster that corresponds to the same entity having the user identifier for the first field and the geographical indicator for the second field; and
storing the user identifier and the geographical indicator of the full entity cluster into a first record of a database.
6 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are provided for receiving, aggregating, and analyzing data to develop caregiver rankings, recommendations, and other information that care seekers may use to connect with caregivers for services, or for caregivers to use to connect with care seekers. Sample data can be obtained from a plurality of data sources, processed to form data clusters, aggregated to form data records, and provided to a care seeker searching for a caregiver or medical facility.
-
Citations
25 Claims
-
1. A method of aggregating entity data from a plurality of sources, the method comprising:
-
obtaining sample data from a plurality of data sources, the sample data corresponding to a plurality of entities that have not been previously identified, wherein samples from multiple data sources correspond to a same entity; processing the samples to identify a plurality of fields corresponding to each sample; identifying a first partial cluster of the samples based on a first set of rules that relies on a geographical indicator, the first partial cluster comprising a first subset of fields from a first sample for comparison with a second sample on the first subset of fields to determine whether the samples correspond with a same entity, wherein identifying the first partial cluster includes; determining whether the second sample is in the first partial cluster by; determining a first field distance between a first field of the first subset of fields of the first sample and a first field of the first subset of fields of the second sample; calculating a first metric based on the first field distance; and adding the second sample to the first metric when the first metric is within a first threshold; identifying a second partial cluster of the samples based on a second set of rules that relies on a user identifier, the second partial cluster comprising a second subset of fields from the first sample for comparison with the second sample on the second subset of fields to determine whether the samples correspond with the same entity, wherein identifying the second partial cluster includes; determining whether the second sample is in the second partial cluster by; determining a second field distance between a second field of the second subset of fields of the first sample and a second field of the second subset of fields of the second sample; calculating a second metric based on the second field distance; and adding the second sample to the second metric when the second metric is within a second threshold; initiating an aggregation process that determines that the second partial cluster of the samples corresponds to the same entity as the first partial cluster of the samples, wherein the aggregation process generates a full entity cluster that corresponds to the same entity having the user identifier for the first field and the geographical indicator for the second field; and storing the user identifier and the geographical indicator of the full entity cluster into a first record of a database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A computer product comprising a non-transitory computer readable medium embodying thereon a set of instructions, which when executed by a computer system cause the computer system to perform the steps of:
-
obtaining sample data from a plurality of data sources, the sample data corresponding to a plurality of entities that have not been previously identified, wherein samples from multiple data sources correspond to a same entity; processing the samples to identify a plurality of fields corresponding to each sample; identifying a first partial cluster of the samples based on a first set of rules that relies on a geographical indicator, the first partial cluster comprising a first subset of fields from a first sample for comparison with a second sample on the first subset of fields to determine whether the samples correspond with a same entity, wherein identifying the first partial cluster includes; determining whether the second sample is in the first partial cluster by; determining a first field distance between a first field of the first subset of fields of the first sample and a first field of the first subset of fields of the second sample; calculating a first metric based on the first field distance; and adding the second sample to the first metric when the first metric is within a first threshold; identifying a second partial cluster of the samples based on a second set of rules that relies on a user identifier, the second partial cluster comprising a second subset of fields from the first sample for comparison with the second sample on the second subset of fields to determine whether the samples correspond with the same entity, wherein identifying the second partial cluster includes; determining whether the second sample is in the second partial cluster by; determining a second field distance between a second field of the second subset of fields of the first sample and a second field of the second subset of fields of the second sample; calculating a second metric based on the second field distance; and adding the second sample to the second metric when the second metric is within a second threshold; initiating an aggregation process that determines that the second partial cluster of the samples corresponds to the same entity as the first partial cluster of the samples, wherein the aggregation process generates a full entity cluster that corresponds to the same entity having a user identifier for the first field and the geographical indicator for the second field; and storing the user identifier and the geographical indicator of the full entity cluster into a first record of a database. - View Dependent Claims (23, 24, 25)
-
Specification