Technique for recycling match weight calculations
First Claim
Patent Images
1. A method comprising:
- determining a number of linking operations to perform in a database prior to performing a search operation in the database, wherein the number of linking operations is determined such that field value weights do not change more than a predetermined percentage with further iterations of the linking operations, the database comprising a plurality of entity representations, each entity representation comprising a plurality of linked records, each record comprising a plurality of fields, each field comprising a field value, each field value associated with a field value weight, the determining comprising;
calculating a logarithm of a number of the plurality of entity representations in the database;
performing a linking operation, the linking operation comprising;
(i) calculating field value weights for the plurality of fields of records in the database, wherein each field value weight comprises a logarithm of a field probability, wherein the field probability is associated with a particular field in a record chosen at random from the database and wherein the field probability represents a probability that another randomly selected record will share a same field value in the particular field; and
(ii) linking entity representations in the database based on match scores calculated from a plurality of field value weights and a confidence level based on the number of the plurality of entity representations in the database;
repeating the calculating field value weights and the linking entity representations a number of times, wherein repeating the calculating the number of times is determined by the calculated logarithm of the number of the plurality of entity representations in the database;
receiving a plurality of search criteria field values;
performing a search operation after the repeating, the search operation comprising;
(i) determining a highest ranked entity representation according to summed field value weights for field values matching the plurality of search criteria field values;
(ii) calculating a confidence level reflecting a likelihood that the highest ranked entity representation corresponds to the plurality of search criteria field values; and
(iii) outputting, if the confidence level exceeds a predetermined threshold, an identifier for the highest ranked entity representation; and
repeating the calculating field value weights and the linking entity representations after the performing the search operation until the field value weights for the plurality of fields of records in the database have substantially stabilized such that the field value weights do not change more than 10%.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is a system for, and method of, recycling field value weights as computed for database linking purposes. Such field value weights may be used for a search operation. In some embodiments, such weights may be used for a search operation prior to their values stabilizing during an iterative linking operation.
-
Citations
6 Claims
-
1. A method comprising:
-
determining a number of linking operations to perform in a database prior to performing a search operation in the database, wherein the number of linking operations is determined such that field value weights do not change more than a predetermined percentage with further iterations of the linking operations, the database comprising a plurality of entity representations, each entity representation comprising a plurality of linked records, each record comprising a plurality of fields, each field comprising a field value, each field value associated with a field value weight, the determining comprising; calculating a logarithm of a number of the plurality of entity representations in the database; performing a linking operation, the linking operation comprising; (i) calculating field value weights for the plurality of fields of records in the database, wherein each field value weight comprises a logarithm of a field probability, wherein the field probability is associated with a particular field in a record chosen at random from the database and wherein the field probability represents a probability that another randomly selected record will share a same field value in the particular field; and (ii) linking entity representations in the database based on match scores calculated from a plurality of field value weights and a confidence level based on the number of the plurality of entity representations in the database; repeating the calculating field value weights and the linking entity representations a number of times, wherein repeating the calculating the number of times is determined by the calculated logarithm of the number of the plurality of entity representations in the database; receiving a plurality of search criteria field values; performing a search operation after the repeating, the search operation comprising; (i) determining a highest ranked entity representation according to summed field value weights for field values matching the plurality of search criteria field values; (ii) calculating a confidence level reflecting a likelihood that the highest ranked entity representation corresponds to the plurality of search criteria field values; and (iii) outputting, if the confidence level exceeds a predetermined threshold, an identifier for the highest ranked entity representation; and repeating the calculating field value weights and the linking entity representations after the performing the search operation until the field value weights for the plurality of fields of records in the database have substantially stabilized such that the field value weights do not change more than 10%. - View Dependent Claims (2, 3)
-
-
4. A system for determining a number of linking operations to perform in a database prior to performing a search operation in the database, wherein the number of linking operations is determined such that field value weights do not change more than a predetermined percentage with further iterations of the linking operations, the database comprising a plurality of entity representations, each entity representation comprising a plurality of linked records, each record comprising a plurality of fields, each field comprising a field value, each field value associated with a field value weight, the system comprising:
-
an electronic database comprising a plurality of entity representations, each entity representation comprising a plurality of linked records, each record comprising a plurality of fields, each field comprising a field value, each field value associated with a field value weight; a processor programmed to calculate a logarithm of a number of the plurality of entity representations in the database; a computer system configured to repeatedly perform a linking operation a number of times determined by the logarithm of a number of entity representations in the database, the linking operation comprising; (i) calculating field value weights for the plurality of fields of records in the database, wherein each field value weight comprises a logarithm of a field probability, wherein the field probability is associated with a particular field in a record chosen at random from the database and wherein the field probability represents a probability that another randomly selected record will share a same field value in the particular field; and (ii) linking entity representations in the database based on match scores calculated from a plurality of field value weights and a confidence level based on the number of the plurality of entity representations in the database; an electronic memory storing a plurality of search criteria field values; and a computer system configured to perform a search operation, the search operation comprising; (i) determining a highest ranked entity representation according to summed field value weights for field values matching the plurality of search criteria field values; (ii) calculating a confidence level reflecting a likelihood that the highest ranked entity representation corresponds to the plurality of search criteria field values; and (iii) outputting, if the confidence level exceeds a predetermined threshold, an identifier for the highest ranked entity representation; and a computer system configured to repeat the calculating field value weights and the linking entity representations after the performing the search operation until the field value weights for the plurality of fields of records in the database have substantially stabilized such that the field value weights do not change more than 10%. - View Dependent Claims (5, 6)
-
Specification