×

Data clustering based on candidate queries

  • US 9,361,355 B2
  • Filed: 11/15/2012
  • Issued: 06/07/2016
  • Est. Priority Date: 11/15/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method, including:

  • receiving data records, the received data records each including one or more values in one or more fields; and

    processing the received data records to identify at least one matched data cluster to associate with each received data record, the processing including;

    for at least one selected data record from the received data records, generating a query from the one or more values included in the selected data record and performing at least a first comparison, a second comparison, and a third comparison using the generated query;

    identifying, in the first comparison, one or more candidate data records from the received data records using the query and an approximate distance measure;

    determining, in the second comparison performed after the first comparison, whether or not the selected data record satisfies a growth criterion for at least one candidate data cluster of one or more existing data clusters containing the candidate records, wherein the growth criterion is different from any cluster membership criterion for any candidate data cluster and uses the query and a first threshold associated with a boundary around a respective predetermined member of a candidate data cluster;

    determining, in the third comparison performed after the second comparison, whether or not the selected data record satisfies a cluster membership criterion for at least one candidate data cluster of one or more existing data clusters containing the candidate records using the query and a second threshold associated with a detailed distance measure more accurate than the approximate distance measure; and

    selecting the matched data cluster from among one or more candidate data clusters if the selected data record satisfies both the cluster membership criterion and the growth criterion for the matched data cluster, or initializing the matched data cluster with the selected data record if the selected data record does not satisfy the growth criterion for any of the existing data clusters or if the selected data record does satisfy the growth criterion for at least one of the existing data clusters but does not satisfy a cluster membership criterion for any of the existing data clusters.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×