DATA CLUSTERING BASED ON CANDIDATE QUERIES
First Claim
1. A method, including:
- receiving data records, the received data records each including one or more values in one or more fields; and
processing the received data records to identify a matched data cluster to associate with each received data record, the processing including;
for selected data records from the received data records, generating a query from the one or more values included in the selected data record;
identifying one or more candidate data records from the received data records using the query;
determining whether or not the selected data record satisfies a cluster membership criterion for at least one candidate data cluster of one or more existing data clusters containing the candidate records; and
selecting the matched data cluster from among one or more candidate data clusters based at least in part on a growth criterion for the candidate data clusters, or initializing the matched data cluster with the selected data record if the selected data record does not satisfy a cluster membership criterion for any of the existing data clusters or based on a result of the growth criterion.
3 Assignments
0 Petitions
Accused Products
Abstract
Received data records, each including one or more values in one or more fields, are processed to identify a matched data cluster. The processing includes: for selected data records, generating a query from one or more values; identifying one or more candidate data records from the received data records using the query; determining whether or not the selected data record satisfies a cluster membership criterion for at least one candidate data cluster of one or more existing data clusters containing the candidate records; and selecting the matched data cluster from among one or more candidate data clusters based at least in part on a growth criterion for the candidate data clusters, or initializing the matched data cluster with the selected data record if the selected data record does not satisfy a cluster membership criterion for any of the existing data clusters or based on a result of the growth criterion.
87 Citations
22 Claims
-
1. A method, including:
-
receiving data records, the received data records each including one or more values in one or more fields; and processing the received data records to identify a matched data cluster to associate with each received data record, the processing including; for selected data records from the received data records, generating a query from the one or more values included in the selected data record; identifying one or more candidate data records from the received data records using the query; determining whether or not the selected data record satisfies a cluster membership criterion for at least one candidate data cluster of one or more existing data clusters containing the candidate records; and selecting the matched data cluster from among one or more candidate data clusters based at least in part on a growth criterion for the candidate data clusters, or initializing the matched data cluster with the selected data record if the selected data record does not satisfy a cluster membership criterion for any of the existing data clusters or based on a result of the growth criterion. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer program stored on a computer-readable storage medium, the computer program including instructions for causing a computing system to:
-
receive data records, the received data records each including one or more values in one or more fields; and process the received data records to identify a matched data cluster to associate with each received data record, the processing including; for selected data records from the received data records, generating a query from the one or more values included in the selected data record; identifying one or more candidate data records from the received data records using the query; determining whether or not the selected data record satisfies a cluster membership criterion for at least one candidate data cluster of one or more existing data clusters containing the candidate records; and selecting the matched data cluster from among one or more candidate data clusters based at least in part on a growth criterion for the candidate data clusters, or initializing the matched data cluster with the selected data record if the selected data record does not satisfy a cluster membership criterion for any of the existing data clusters or based on a result of the growth criterion.
-
-
21. A computing system, including:
-
an input device or port configured to receive data records, the received data records each including one or more values in one or more fields; and at least one processor configured to process the received data records to identify a matched data cluster to associate with each received data record, the processing including; for selected data records from the received data records, generating a query from the one or more values included in the selected data record; identifying one or more candidate data records from the received data records using the query; determining whether or not the selected data record satisfies a cluster membership criterion for at least one candidate data cluster of one or more existing data clusters containing the candidate records; and selecting the matched data cluster from among one or more candidate data clusters based at least in part on a growth criterion for the candidate data clusters, or initializing the matched data cluster with the selected data record if the selected data record does not satisfy a cluster membership criterion for any of the existing data clusters or based on a result of the growth criterion.
-
-
22. A computing system, including:
-
means for receiving data records, the received data records each including one or more values in one or more fields; and means for processing the received data records to identify a matched data cluster to associate with each received data record, the processing including; for selected data records from the received data records, generating a query from the one or more values included in the selected data record; identifying one or more candidate data records from the received data records using the query; determining whether or not the selected data record satisfies a cluster membership criterion for at least one candidate data cluster of one or more existing data clusters containing the candidate records; and selecting the matched data cluster from among one or more candidate data clusters based at least in part on a growth criterion for the candidate data clusters, or initializing the matched data cluster with the selected data record if the selected data record does not satisfy a cluster membership criterion for any of the existing data clusters or based on a result of the growth criterion.
-
Specification