×

Method and system for extracting consistent disjoint set membership from multiple inconsistent data sources

  • US 7,752,179 B1
  • Filed: 02/24/2006
  • Issued: 07/06/2010
  • Est. Priority Date: 02/24/2006
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method for extracting a consistent set of entity identifiers comprising:

  • obtaining a first set of entity identifiers comprising a first plurality of groups of entity identifiers using a grouping process;

    obtaining a second set of entity identifiers comprising a second plurality of groups of entity identifiers using the grouping process;

    subsequent to the grouping process, removing an entity identifier associated with a set violation from the first set of entity identifiers and the second set of entity identifiers, wherein the set violation corresponds to the first and second set of entity identifiers pertaining to a same entity, wherein the first and second set of entity identifiers are not identical and each contain at least two entity identifiers;

    associating each entity identifier from the first set of entity identifiers with at least one of a first plurality of GroupIDs, wherein each of the first plurality of GroupIDs is an entity identifier designated from among the first set of entity identifiers;

    associating each entity identifier from the second set of entity identifiers with at least one of a second plurality of GroupIDs, wherein each of the second plurality of GroupIDs is an entity identifier designated from among the second set of entity identifiers;

    linking at least one of the first plurality of GroupIDs having a common entity identifier with at least one of the second plurality of GroupIDs having the common entity identifier;

    during the linking, identifying a contaminated entity identifier associated with the at least one of the first and second plurality of GroupIDs based on a violation of a linking policy, wherein the linking policy is violated when one of the first plurality of GroupIDs is linked to more than one of the second plurality of GroupIDs;

    removing the contaminated entity identifier from the first and second set of entity identifiers based on the violation of the linking policy; and

    combining the first set of entity identifiers with the second set of entity identifiers following the removal of the contaminated entity identifier to extract the consistent set of entity identifiers.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×