×

Method and system for processing data records

  • US 7,403,942 B1
  • Filed: 02/04/2003
  • Issued: 07/22/2008
  • Est. Priority Date: 02/04/2003
  • Status: Active Grant
First Claim
Patent Images

1. A method for identifying an entity from a plurality of entity references, each entity reference being linked with a separate ghost entity, the method comprising the steps of:

  • comparing an entity reference of a first ghost entity with an entity reference of a second ghost entity to determine a match probability between the entity reference of the first ghost entity and the entity reference of the second ghost entity, wherein the match probability between the entity reference of the first ghost entity and the entity reference of the second ghost entity comprises a content weight Wc,i for a particular data field i, wherein the content weight Wc,i for a particular data field i is inversely related to a sum of a total number of occurrences of a data field value and a cautiousness value;

    assigning a definitive identifier to an entity reference based on the match probability;

    linking the entity reference of the first ghost entity additionally with the second ghost entity and the entity reference of the second ghost entity additionally with the first ghost entity when the match probability is greater than or equal to a match threshold;

    repeating steps of the comparing and the linking for one or more ghost entity pairings possible from the ghost entities;

    determining, for one or more entity references linked to a ghost entity, a score for an entity reference based at least in part on a match probability between an entity reference and the entity and a match probability between the entity reference and the ghost entity; and

    identifying the ghost entity as an actual entity based at least in part on one or more scores for the one or more entity references linked to the ghost entity, wherein the identifying the ghost entity as an actual entity comprises associating a same definitive identifier with the one or more entity references linked to the ghost entity.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×