×

Data clustering based on variant token networks

  • US 9,037,589 B2
  • Filed: 11/15/2012
  • Issued: 05/19/2015
  • Est. Priority Date: 11/15/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method, including:

  • receiving data records, the received data records each including one or more values in one or more fields; and

    processing the received data records to identify one or more data clusters of two or more data records, where the data clusters are identified based on candidate data records that are identified based on a network representing identified tokens, the processing including;

    identifying tokens that each include at least one value or fragment of a value in a field or a combination of fields of the received data records;

    generating the network representing the identified tokens, with nodes of the network representing individual tokens and edges of the network each representing a variant relationship between tokens;

    identifying, for each received data record to be associated with a data cluster, a corresponding set of candidate data records, such that candidate data records that are in the same set each include one or more tokens from the same group of tokens represented by a subset of connected nodes in the generated network; and

    for at least one candidate data record in the set of candidate data records corresponding to a received data record, determining whether or not the received data record satisfies a cluster association criterion for a candidate data cluster to which the candidate data record belongs.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×