Using relationships in candidate discovery
First Claim
1. A computer-implemented method of resolving entities in an entity resolution system storing identity records related to a plurality of entities, the method comprising:
- receiving a new identity record;
identifying a set of candidate entities, from the plurality of entities, based upon a match between an attribute of the new identity record and corresponding attributes of one or more of the plurality of entities;
identifying, from the plurality of entities not included in the set of candidate entities, a set of first-degree entities having a likeness score satisfying a threshold, wherein the likeness score for each first-degree entity is determined relative to a respective candidate entity;
by operation of one or more processors, identifying, from the plurality of entities not included in the set of candidate entities and not included in the set of first-degree entities, a set of second-degree entities having a likeness score satisfying the threshold, wherein the likeness score for each second-degree entity is determined relative to a respective first-degree entity, wherein the threshold is based on a count of degrees of separation from a respective candidate entity, such that the threshold to be satisfied by the set of second-degree entities is stricter than the threshold to be satisfied by the set of first-degree entities;
adding, to the set of candidate entities, the set of first-degree entities and the set of second-degree entities; and
upon determining that the new identity record refers to a candidate entity in the set of candidate entities, including any added entities, conjoining the new identity record and the candidate entity to form a first conjoined entity, wherein the first conjoined entity is further conjoinable with a different entity of the plurality of entities to resolve an instance of data ambiguity.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are disclosed for adding entities to a group of entity resolution candidates by selecting entities that have a minimum threshold of similarity to a candidate, allowing a greater number of resolutions in an entity resolution system. To resolve an incoming identity record, an initial group of candidates may be selected from known entities by identifying entities that match a candidate building attribute of the incoming identity record. Additional candidates may be selected by identifying entities with some information that is similar to one of the candidate entities.
15 Citations
21 Claims
-
1. A computer-implemented method of resolving entities in an entity resolution system storing identity records related to a plurality of entities, the method comprising:
-
receiving a new identity record; identifying a set of candidate entities, from the plurality of entities, based upon a match between an attribute of the new identity record and corresponding attributes of one or more of the plurality of entities; identifying, from the plurality of entities not included in the set of candidate entities, a set of first-degree entities having a likeness score satisfying a threshold, wherein the likeness score for each first-degree entity is determined relative to a respective candidate entity; by operation of one or more processors, identifying, from the plurality of entities not included in the set of candidate entities and not included in the set of first-degree entities, a set of second-degree entities having a likeness score satisfying the threshold, wherein the likeness score for each second-degree entity is determined relative to a respective first-degree entity, wherein the threshold is based on a count of degrees of separation from a respective candidate entity, such that the threshold to be satisfied by the set of second-degree entities is stricter than the threshold to be satisfied by the set of first-degree entities; adding, to the set of candidate entities, the set of first-degree entities and the set of second-degree entities; and upon determining that the new identity record refers to a candidate entity in the set of candidate entities, including any added entities, conjoining the new identity record and the candidate entity to form a first conjoined entity, wherein the first conjoined entity is further conjoinable with a different entity of the plurality of entities to resolve an instance of data ambiguity. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product comprising a computer useable storage medium having a computer readable program, wherein the computer readable program, when executed on a computer causes the computer to perform an operation for resolving entities in an entity resolution system storing identity records related to a plurality of entities, the operation comprising:
-
receiving a new identity record; identifying a set of candidate entities, from the plurality of entities, based upon a match between an attribute of the new identity record and corresponding attributes of one or more of the plurality of entities; identifying, from the plurality of entities not included in the set of candidate entities, a set of first-degree entities having a likeness score satisfying a threshold, wherein the likeness score for each first-degree entity is determined relative to a respective candidate entity; by operation of one or more computer processors of the computer when executing the computer readable program, identifying, from the plurality of entities not included in the set of candidate entities and not included in the set of first-degree entities, a set of second-degree entities having a likeness score satisfying the threshold, wherein the likeness score for each second-degree entity is determined relative to a respective first-degree entity, wherein the threshold is based on a count of degrees of separation from a respective candidate entity, such that the threshold to be satisfied by the set of second-degree entities is stricter than the threshold to be satisfied by the set of first-degree entities; adding, to the set of candidate entities, the set of first-degree entities and the set of second-degree entities; and upon determining that the new identity record refers to a candidate entity in the set of candidate entities, including any added entities, conjoining the new identity record and the candidate entity to form a first conjoined entity, wherein the first conjoined entity is further conjoinable with a different entity of the plurality of entities to resolve an instance of data ambiguity. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system, comprising:
-
one or more computer processors; and a memory containing a program, which when executed by the one or more computer processors, performs an operation for resolving entities in an entity resolution system storing identity records related to a plurality of entities by performing the steps of; receiving a new identity record; identifying a set of candidate entities, from the plurality of entities, based upon a match between an attribute of the new identity record and corresponding attributes of one or more of the plurality of entities; identifying, from the plurality of entities not included in the set of candidate entities, a set of first-degree entities having a likeness score satisfying a threshold, wherein the likeness score for each first-degree entity is determined relative to a respective candidate entity; identifying, from the plurality of entities not included in the set of candidate entities and not included in the set of first-degree entities, a set of second-degree entities having a likeness score satisfying the threshold, wherein the likeness score for each second-degree entity is determined relative to a respective first-degree entity, wherein the threshold is based on a count of degrees of separation from a respective candidate entity, such that the threshold to be satisfied by the set of second-degree entities is stricter than the threshold to be satisfied by the set of first-degree entities; adding, to the set of candidate entities, the set of first-degree entities and the set of second-degree entities; and upon determining that the new identity record refers to a candidate entity in the set of candidate entities, including any added entities, conjoining the new identity record and the candidate entity to form a first conjoined entity, wherein the first conjoined entity is further conjoinable with a different entity of the plurality of entities to resolve an instance of data ambiguity. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification