×

Technique for recycling match weight calculations

  • US 8,639,705 B2
  • Filed: 07/02/2009
  • Issued: 01/28/2014
  • Est. Priority Date: 07/02/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • determining a number of linking operations to perform in a database prior to performing a search operation in the database, wherein the number of linking operations is determined such that field value weights do not change more than a predetermined percentage with further iterations of the linking operations, the database comprising a plurality of entity representations, each entity representation comprising a plurality of linked records, each record comprising a plurality of fields, each field comprising a field value, each field value associated with a field value weight, the determining comprising;

    calculating a logarithm of a number of the plurality of entity representations in the database;

    performing a linking operation, the linking operation comprising;

    (i) calculating field value weights for the plurality of fields of records in the database, wherein each field value weight comprises a logarithm of a field probability, wherein the field probability is associated with a particular field in a record chosen at random from the database and wherein the field probability represents a probability that another randomly selected record will share a same field value in the particular field; and

    (ii) linking entity representations in the database based on match scores calculated from a plurality of field value weights and a confidence level based on the number of the plurality of entity representations in the database;

    repeating the calculating field value weights and the linking entity representations a number of times, wherein repeating the calculating the number of times is determined by the calculated logarithm of the number of the plurality of entity representations in the database;

    receiving a plurality of search criteria field values;

    performing a search operation after the repeating, the search operation comprising;

    (i) determining a highest ranked entity representation according to summed field value weights for field values matching the plurality of search criteria field values;

    (ii) calculating a confidence level reflecting a likelihood that the highest ranked entity representation corresponds to the plurality of search criteria field values; and

    (iii) outputting, if the confidence level exceeds a predetermined threshold, an identifier for the highest ranked entity representation; and

    repeating the calculating field value weights and the linking entity representations after the performing the search operation until the field value weights for the plurality of fields of records in the database have substantially stabilized such that the field value weights do not change more than 10%.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×