×

Data clustering based on candidate queries

  • US 10,572,511 B2
  • Filed: 06/02/2016
  • Issued: 02/25/2020
  • Est. Priority Date: 11/15/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method, including:

  • receiving data records, the received data records each including one or more values in one or more fields; and

    processing the received data records to identify a matched data cluster to associate with each received data record, the processing including;

    for at least one selected data record from the received data records, generating a first query from a first set of one or more values included in the selected data record including identifying tokens that each include a representation of at least one value or fragment of a value in a field or a combination of fields of the selected record and generating a second query from a second set of one or more values included in the selected data record, where the second set of one or more values is different from the first set of one or more values;

    identifying a first set of one or more candidate data records from the received data records using the first query;

    identifying a second set of one or more candidate data records from the received data records using the second query, the second set of one or more candidate data records partially overlapping the first set of one or more candidate data records;

    determining a third set of one or more candidate data records as a Boolean combination of the first set of one or more candidate data records and the second set of one or more candidate data records;

    determining whether or not the selected data record satisfies a cluster membership criterion for at least one candidate data cluster of one or more existing data clusters containing one or more candidate data records from at least one of the first set of one or more candidate data records or the second set of one or more candidate data records, the determining including applying the cluster membership criterion to the third set of one or more candidate data records; and

    selecting the matched data cluster from among one or more candidate data clusters, or initializing the matched data cluster with the selected data record if the selected data record does not satisfy a cluster membership criterion for any of the existing data clusters.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×