Using vertex self-information scores for vertices in an entity graph to determine whether to perform entity resolution on the vertices in the entity graph
First Claim
1. A computer program product for entity resolution of records in a database, the computer program product comprising a non-transitory computer readable storage medium having computer readable program code to perform operations, the operations comprising:
- determining pairs of records in the database having a relationship value satisfying a threshold;
generating an entity relationship graph having a vertex for each of the records of the pairs and an edge for each of the determined pairs between two vertices representing records in one of the determined pairs, wherein each vertex is associated with a self-information score based on content in the record represented by the vertex and is assigned an initial unique entity identifier and an entity information score; and
determining whether to update the entity information score and entity identifier for each subject vertex of the vertices by performing for each subject vertex of the vertices;
determining a target vertex directly connected to the subject vertex that has a highest entity information score of at least one vertex directly connected to the subject vertex that has an entity information score greater than the entity information score of the subject vertex; and
setting the subject vertex entity identifier and entity information score to the entity identifier and entity information score of the target vertex in response to a target vertex self-information score satisfying a criteria to perform entity resolution for the record represented by the subject vertex.
1 Assignment
0 Petitions
Accused Products
Abstract
Provided are a computer program product, system, and method to determine whether to perform entity resolution on vertices in an entity graph. A determination is made of pairs of records in a database having a relationship value satisfying a threshold. An entity relationship graph has a vertex for each of the records of the pairs and an edge between two vertices. Each vertex has a self-information score based on content in the record, an initial unique entity identifier, and an entity information score. For each subject vertex of the vertices, a determination is made of a target vertex directly connected to the subject vertex that has a highest entity information score and whether to set the subject vertex entity identifier and entity information score to the entity identifier and entity information score of the target vertex based on the target vertex self-information score.
-
Citations
18 Claims
-
1. A computer program product for entity resolution of records in a database, the computer program product comprising a non-transitory computer readable storage medium having computer readable program code to perform operations, the operations comprising:
-
determining pairs of records in the database having a relationship value satisfying a threshold; generating an entity relationship graph having a vertex for each of the records of the pairs and an edge for each of the determined pairs between two vertices representing records in one of the determined pairs, wherein each vertex is associated with a self-information score based on content in the record represented by the vertex and is assigned an initial unique entity identifier and an entity information score; and determining whether to update the entity information score and entity identifier for each subject vertex of the vertices by performing for each subject vertex of the vertices; determining a target vertex directly connected to the subject vertex that has a highest entity information score of at least one vertex directly connected to the subject vertex that has an entity information score greater than the entity information score of the subject vertex; and setting the subject vertex entity identifier and entity information score to the entity identifier and entity information score of the target vertex in response to a target vertex self-information score satisfying a criteria to perform entity resolution for the record represented by the subject vertex. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for entity resolution of records in a database, comprising:
-
a processor circuitry; a non-transitory computer readable storage medium having computer readable program code embodied therein that when executed by the processor circuitry performs operations, the operations comprising; determining pairs of records in the database having a relationship value satisfying a threshold; generating an entity relationship graph having a vertex for each of the records of the pairs and an edge for each of the determined pairs between two vertices representing records in one of the determined pairs, wherein each vertex is associated with a self-information score based on content in the record represented by the vertex and is assigned an initial unique entity identifier and an entity information score, which is initially set to the entity information score of the vertex; and determining whether to update the entity information score and entity identifier for each subject vertex of the vertices by performing for each subject vertex of the vertices; determining a target vertex directly connected to the subject vertex that has a highest entity information score of at least one vertex directly connected to the subject vertex that has an entity information score greater than the entity information score of the subject vertex; and setting the subject vertex entity identifier and entity information score to the entity identifier and entity information score of the target vertex in response to a target vertex self-information score satisfying a criteria to perform entity resolution for the record represented by the subject vertex. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification