Using vertex self-information scores for vertices in an entity graph to determine whether to perform entity resolution on the vertices in the entity graph
First Claim
1. A method performed by a computer program executed by a processor to perform entity resolution of records in a database implemented in a computer storage device, comprising:
- determining pairs of records in the database having a relationship value satisfying a threshold;
generating an entity relationship graph having a vertex for each of the records of the pairs and an edge for each of the determined pairs between two vertices representing records in one of the determined pairs, wherein each vertex is associated with a self-information score based on content in the record represented by the vertex and is assigned an initial unique entity identifier and an entity information score, which is initially set to the self-information score of the vertex; and
determining whether to update the entity information score and entity identifier for each subject vertex of the vertices by performing for each subject vertex of the vertices;
determining a target vertex directly connected to the subject vertex that has a highest entity information score of at least one vertex directly connected to the subject vertex that has an entity information score greater than the entity information score of the subject vertex; and
setting the subject vertex entity identifier and entity information score to the entity identifier and entity information score of the target vertex in response to a target vertex self-information score satisfying a criteria to perform entity resolution for the record represented by the subject vertex.
1 Assignment
0 Petitions
Accused Products
Abstract
Provided are a computer program product, system, and method to determine whether to perform entity resolution on vertices in an entity graph. A determination is made of pairs of records in a database having a relationship value satisfying a threshold. An entity relationship graph has a vertex for each of the records of the pairs and an edge between two vertices. Each vertex has a self-information score based on content in the record, an initial unique entity identifier, and an entity information score. For each subject vertex of the vertices, a determination is made of a target vertex directly connected to the subject vertex that has a highest entity information score and whether to set the subject vertex entity identifier and entity information score to the entity identifier and entity information score of the target vertex based on the target vertex self-information score.
42 Citations
9 Claims
-
1. A method performed by a computer program executed by a processor to perform entity resolution of records in a database implemented in a computer storage device, comprising:
-
determining pairs of records in the database having a relationship value satisfying a threshold; generating an entity relationship graph having a vertex for each of the records of the pairs and an edge for each of the determined pairs between two vertices representing records in one of the determined pairs, wherein each vertex is associated with a self-information score based on content in the record represented by the vertex and is assigned an initial unique entity identifier and an entity information score, which is initially set to the self-information score of the vertex; and determining whether to update the entity information score and entity identifier for each subject vertex of the vertices by performing for each subject vertex of the vertices; determining a target vertex directly connected to the subject vertex that has a highest entity information score of at least one vertex directly connected to the subject vertex that has an entity information score greater than the entity information score of the subject vertex; and setting the subject vertex entity identifier and entity information score to the entity identifier and entity information score of the target vertex in response to a target vertex self-information score satisfying a criteria to perform entity resolution for the record represented by the subject vertex. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
Specification