×

Internal linking co-convergence using clustering with hierarchy

  • US 9,037,606 B2
  • Filed: 09/17/2013
  • Issued: 05/19/2015
  • Est. Priority Date: 02/04/2003
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • clustering hierarchical database records into a first set of clusters having corresponding first cluster identifications (IDs), each hierarchical database record comprising one or more field values, the clustering based at least in part on determining similarity among corresponding field values of the hierarchical database records;

    determining parent-child hierarchical relationships among the hierarchical database records;

    associating related hierarchical database records by;

    determining highest compelling linkages among the hierarchical database records, the determining comprising;

    identifying mutually preferred pairs of records from the hierarchical database records, each mutually preferred pair of records consisting of a first record and a second record, the first record consisting of a preferred record associated with the second record and the second record consisting of a preferred record associated with the first record, wherein the mutually preferred pairs of records each has a match score that meets pre-specified match criteria;

    assigning, for each record from the hierarchical database records, at least one associated preferred record, wherein a match value assigned to a given record together with its associated preferred record is at least as great as a match value assigned to the record together with any other record in the database records; and

    forming and storing a plurality of entity representations in the database, each entity representation of the plurality of entity representations comprising at least one linked pair of mutually preferred records;

    applying a hierarchal directional linking process, the hierarchal directional linking process comprising selecting and applying at least an upward process based on the determined parent-child hierarchical relationship wherein the upward process comprises;

    determining, from the parent-child hierarchical relationships, similarity among a plurality of child records having initial separate parent records;

    in response to determining a threshold similarity among the plurality of child records, inferring that the initial separate parent records correspond to the same entity; and

    linking, responsive to the inferring, the initial separate parent records as inferred common parent records;

    re-clustering at least a portion of the database records into a second set of clusters having corresponding second cluster IDs, the re-clustering based at least in part on the associating related hierarchical database records and on the determining similarity among corresponding field values of the database records; and

    outputting database record information, based at least in part on the re-clustering.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×