SYSTEM FOR CLUSTERING AND AGGREGATING DATA FROM MULTIPLE SOURCES
First Claim
Patent Images
1. A method of aggregating entity data from a plurality of sources, the method comprising:
- obtaining sample data from a plurality of data sources, the sample data corresponding to a plurality of entities, wherein samples from multiple data sources correspond to a same entity;
processing the samples to identify a plurality of fields corresponding to each sample, the fields including a name and a geographical indicator;
identifying a first cluster of the samples as corresponding to a first entity based on a first set of rules, the first cluster including a first sample, wherein identifying the first cluster includes;
determining whether a second sample is in the first cluster by;
determining a first field distance between a first field of the first sample and the first field of the second sample;
calculating a first metric based on the first field distance; and
adding the second sample to the first metric when the first metric is within a first threshold;
comparing the fields of at least a portion of the samples in the first cluster to determine the name and the geographical indicator for the first entity; and
storing the name and the geographical indicator of the first entity into a first record of a database.
6 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are provided for receiving, aggregating, and analyzing data to develop caregiver rankings, recommendations, and other information that care seekers may use to connect with caregivers for services, or for caregivers to use to connect with care seekers. Sample data can be obtained from a plurality of data sources, processed to form data clusters, aggregated to form data records, and provided to a care seeker searching for a caregiver or medical facility.
58 Citations
20 Claims
-
1. A method of aggregating entity data from a plurality of sources, the method comprising:
-
obtaining sample data from a plurality of data sources, the sample data corresponding to a plurality of entities, wherein samples from multiple data sources correspond to a same entity; processing the samples to identify a plurality of fields corresponding to each sample, the fields including a name and a geographical indicator; identifying a first cluster of the samples as corresponding to a first entity based on a first set of rules, the first cluster including a first sample, wherein identifying the first cluster includes; determining whether a second sample is in the first cluster by; determining a first field distance between a first field of the first sample and the first field of the second sample; calculating a first metric based on the first field distance; and adding the second sample to the first metric when the first metric is within a first threshold; comparing the fields of at least a portion of the samples in the first cluster to determine the name and the geographical indicator for the first entity; and storing the name and the geographical indicator of the first entity into a first record of a database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer product comprising a non-transitory computer readable medium embodying thereon a set of instructions, which when executed by a computer system cause the computer system to perform the steps of:
-
obtaining sample data from a plurality of data sources, the sample data corresponding to a plurality of entities, wherein samples from multiple data sources correspond to a same entity; processing the samples to identify a plurality of fields corresponding to each sample, the fields including a name and a geographical indicator; identifying a first cluster of the samples as corresponding to a first entity based on a first set of rules, the first cluster including a first sample, wherein identifying the first cluster includes; determining whether a second sample is in the first cluster by; determining a first field distance between a first field of the first sample and the first field of the second sample; calculating a first metric based on the first field distance; and adding the second sample to the first metric when the first metric is within a first threshold; comparing the fields of at least a portion of the samples in the first cluster to determine the name and the geographical indicator for the first entity; and storing the name and the geographical indicator of the first entity into a first record of a database. - View Dependent Claims (18, 19, 20)
-
Specification