Method and system for processing and linking data records
First Claim
1. A method for linking a plurality of entity references to at least one entity stored in a computer comprising the steps of:
- evaluating, using a computing matrix, a probability of a match between a first entity reference and a second entity reference based at least in part on a statistical significance of a combination of a plurality of field values common to both the first entity reference and the second entity reference, wherein the statistical significance is inversely related to a number of entity references having the combination of plurality of field values from among some or all entity references, wherein the first entity reference is a member of the at least one entity stored in the computer;
determining, during the evaluating of the probability of a match between the first entity reference and the second entity reference, whether a field value of one of the two entity references is a probable synonym of a field value of the other entity reference;
substituting the probable synonym into the field value of the other entity reference and repeating the evaluating of the probability of a match, upon determining that the field value of one of the two entity references is a probable synonym of the field value of the other entity reference; and
linking, using the computing matrix, the first entity reference with the second entity reference when the probability is greater than or equal to a match threshold.
2 Assignments
0 Petitions
Accused Products
Abstract
Various exemplary systems and methods for linking entity references and identifying associations are presented. In particular, a method is provided for linking a plurality of entity references to at least one entity. The method comprises the steps of evaluating a probability of a match between a first entity reference and a second entity reference based at least in part on a statistical significance of one or more field values being common to both the first entity reference and the second entity reference, wherein field value statistical significance is inversely related to a number of field value occurrences occurring in some or all of the plurality of entity references and linking the first entity reference with the second entity reference when the probability is greater than or equal to a match threshold.
523 Citations
36 Claims
-
1. A method for linking a plurality of entity references to at least one entity stored in a computer comprising the steps of:
-
evaluating, using a computing matrix, a probability of a match between a first entity reference and a second entity reference based at least in part on a statistical significance of a combination of a plurality of field values common to both the first entity reference and the second entity reference, wherein the statistical significance is inversely related to a number of entity references having the combination of plurality of field values from among some or all entity references, wherein the first entity reference is a member of the at least one entity stored in the computer; determining, during the evaluating of the probability of a match between the first entity reference and the second entity reference, whether a field value of one of the two entity references is a probable synonym of a field value of the other entity reference; substituting the probable synonym into the field value of the other entity reference and repeating the evaluating of the probability of a match, upon determining that the field value of one of the two entity references is a probable synonym of the field value of the other entity reference; and linking, using the computing matrix, the first entity reference with the second entity reference when the probability is greater than or equal to a match threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for linking a plurality of entity references to at least one entity stored in a computer, the entity references comprising a plurality of common data fields, the method comprising the steps of:
-
determining, using a computing matrix, a number of occurrences of entity references having a combination of a plurality of field values, wherein the plurality of field values comprises a particular field value; determining, using the computing matrix, a content weight, for the particular field value, wherein the content weight is inversely related to the number of entity references having the combination of the plurality of field values; determining, using the computing matrix, a match probability between a first entity reference and a second entity reference, wherein the first entity reference is a member of the at least one entity stored in a computer, and wherein the probability is related to the content weight for the particular field value; determining, during the determining of the match probability between the first entity reference and the second entity reference, whether a field value of one of the two entity references is a probable synonym of a field value of the other entity reference; substituting the probable synonym into the field value of the other entity reference and repeating the determining of the match probability, upon determining that the field value of one of the two entity references is a probable synonym of the field value of the other entity reference; and linking, using the computing matrix, the first entity reference and the second entity reference when the match probability is greater than or equal to a match threshold. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A method for linking a plurality of entity references to at least one entity stored in a computer comprising the steps of:
-
determining, using a computing matrix, a match probability between a first entity reference and a second entity reference based at least in part on a statistical significance of a combination of a plurality of data field values common to both the first entity reference and the second entity reference, wherein the statistical significance is inversely related to a number of occurrences of entity references having the combination of plurality of field values from among some or all entity references, wherein the first entity reference is a member of the at least one entity stored in the computer; determining, during the determining of the match probability between the first entity reference and the second entity reference, whether a field value of one of the two entity references is a probable synonym of a field value of the other entity reference; substituting the probable synonym into the field value of the other entity reference and repeating the determining of the match probability, upon determining that the field value of one of the two entity references is a probable synonym of the field value of the other entity reference; and linking, using a computing matrix, the first entity reference with the second entity reference when the match probability is greater than or equal to a match threshold. - View Dependent Claims (26, 27, 28, 29, 30)
-
-
31. A computer readable medium comprising a set of executable instructions being adapted to manipulate a processor to:
-
evaluate a probability of a match between a first entity reference and a second entity reference based at least in part on a statistical significance of a combination of a plurality of field values common to both the first entity reference and the second entity reference, wherein the statistical significance is inversely related to a number of entity references having the combination of plurality of field values from among some or all entity references, wherein the first entity reference is a member of at least one entity stored in a computer; determine, during the evaluating of the probability of a match between the first entity reference and the second entity reference, whether a field value of one of the two entity references is a probable synonym of a field value of the other entity reference; substitute the probable synonym into the field value of the other entity reference and repeating the evaluating of the probability of a match, upon determining that the field value of one of the two entity references is a probable synonym of the field value of the other entity reference; and link the first entity reference with the second entity reference when the probability is greater than or equal to a match threshold. - View Dependent Claims (32, 33)
-
-
34. A system for linking a plurality of entity references to at least one entity stored in a computer, the system comprising:
-
memory; a processor operably connected to the memory; and a set of executable instructions stored in the memory and being adapted to manipulate the processor to; evaluate a probability of a match between a first entity reference and a second entity reference based at least in part on a statistical significance of a combination of a plurality of field values common to both the first entity reference and the second entity reference, wherein the statistical significance is inversely related to a number of occurrences of the combination of plurality of field values occurring in some or all of the plurality of entity references, wherein each of the occurrences comprises an entity reference containing the same combination of a plurality of field values common to both the first entity reference and the second entity reference, and wherein the first entity reference is a member of the at least one entity stored in a computer; determine, during the evaluating of the probability of a match between the first entity reference and the second entity reference, whether a field value of one of the two entity references is a probable synonym of a field value of the other entity reference; substitute the probable synonym into the field value of the other entity reference and repeating the evaluating of the probability of a match, upon determining that the field value of one of the two entity references is a probable synonym of the field value of the other entity reference; and link the first entity reference with the second entity reference when the probability is greater than or equal to a match threshold. - View Dependent Claims (35, 36)
-
Specification