Method and system for extracting consistent disjoint set membership from multiple inconsistent data sources
First Claim
1. A computer implemented method for extracting a consistent set of entity identifiers comprising:
- obtaining a first set of entity identifiers comprising a first plurality of groups of entity identifiers using a grouping process;
obtaining a second set of entity identifiers comprising a second plurality of groups of entity identifiers using the grouping process;
subsequent to the grouping process, removing an entity identifier associated with a set violation from the first set of entity identifiers and the second set of entity identifiers, wherein the set violation corresponds to the first and second set of entity identifiers pertaining to a same entity, wherein the first and second set of entity identifiers are not identical and each contain at least two entity identifiers;
associating each entity identifier from the first set of entity identifiers with at least one of a first plurality of GroupIDs, wherein each of the first plurality of GroupIDs is an entity identifier designated from among the first set of entity identifiers;
associating each entity identifier from the second set of entity identifiers with at least one of a second plurality of GroupIDs, wherein each of the second plurality of GroupIDs is an entity identifier designated from among the second set of entity identifiers;
linking at least one of the first plurality of GroupIDs having a common entity identifier with at least one of the second plurality of GroupIDs having the common entity identifier;
during the linking, identifying a contaminated entity identifier associated with the at least one of the first and second plurality of GroupIDs based on a violation of a linking policy, wherein the linking policy is violated when one of the first plurality of GroupIDs is linked to more than one of the second plurality of GroupIDs;
removing the contaminated entity identifier from the first and second set of entity identifiers based on the violation of the linking policy; and
combining the first set of entity identifiers with the second set of entity identifiers following the removal of the contaminated entity identifier to extract the consistent set of entity identifiers.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for extracting a consistent set of entity identifiers including associating each of a first plurality of entity identifiers with at least one of a first plurality of GroupIDs, associating each of a second plurality of entity identifiers with at least one of a second plurality of GroupIDs, combining the first plurality of entity identifiers with the second plurality of entity identifiers to generate a third plurality of entity identifiers, where each of the third plurality of entity identifiers is associated with at least one of a third plurality of GroupIDs, linking at least one of the third plurality of GroupIDs with at least one of the first plurality of GroupIDs based on the third plurality of entity identifiers and the first plurality of entity identifiers, removing a contaminated entity identifier from the third plurality of entity identifiers to extract the consistent set of entity identifiers.
36 Citations
34 Claims
-
1. A computer implemented method for extracting a consistent set of entity identifiers comprising:
-
obtaining a first set of entity identifiers comprising a first plurality of groups of entity identifiers using a grouping process; obtaining a second set of entity identifiers comprising a second plurality of groups of entity identifiers using the grouping process; subsequent to the grouping process, removing an entity identifier associated with a set violation from the first set of entity identifiers and the second set of entity identifiers, wherein the set violation corresponds to the first and second set of entity identifiers pertaining to a same entity, wherein the first and second set of entity identifiers are not identical and each contain at least two entity identifiers; associating each entity identifier from the first set of entity identifiers with at least one of a first plurality of GroupIDs, wherein each of the first plurality of GroupIDs is an entity identifier designated from among the first set of entity identifiers; associating each entity identifier from the second set of entity identifiers with at least one of a second plurality of GroupIDs, wherein each of the second plurality of GroupIDs is an entity identifier designated from among the second set of entity identifiers; linking at least one of the first plurality of GroupIDs having a common entity identifier with at least one of the second plurality of GroupIDs having the common entity identifier; during the linking, identifying a contaminated entity identifier associated with the at least one of the first and second plurality of GroupIDs based on a violation of a linking policy, wherein the linking policy is violated when one of the first plurality of GroupIDs is linked to more than one of the second plurality of GroupIDs; removing the contaminated entity identifier from the first and second set of entity identifiers based on the violation of the linking policy; and combining the first set of entity identifiers with the second set of entity identifiers following the removal of the contaminated entity identifier to extract the consistent set of entity identifiers. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer implemented method for extracting a consistent set of entity identifiers comprising:
-
obtaining a first set of entity identifiers comprising a first plurality of groups of entity identifiers using a grouping process; obtaining a second set of entity identifiers comprising a second plurality of groups of entity identifiers using the grouping process; subsequent to the grouping process, removing an entity identifier associated with a set violation from the first set of entity identifiers and the second set of entity identifiers, wherein the set violation corresponds to the first and second set of entity identifiers pertaining to a same entity, wherein the first and second set of entity identifiers are not identical and each contain at least two entity identifiers; associating each entity identifier from the first set of entity identifiers with at least one of a first plurality of GroupIDs, wherein each of the first plurality of GroupIDs is an entity identifier designated from among the first set of entity identifiers; associating each entity identifier from the second set of entity identifiers with at least one of a second plurality of GroupIDs, wherein each of the second plurality of GroupIDs is an entity identifier designated from among the second set of entity identifiers; combining the first set of entity identifiers with the second set of entity identifiers to generate a third set of entity identifiers, wherein each of the third set of entity identifiers is associated with at least one of a third plurality of GroupIDs; linking at least one of the third plurality of GroupIDs having a common entity identifier with at least one of the first plurality of GroupIDs having the common entity identifier; during the linking, identifying a contaminated entity identifier associated with the at least one of the first and third plurality of GroupIDs based on a violation of a linking policy, wherein the linking policy is violated when one of the first plurality of GroupIDs is linked to more than one of the third plurality of GroupIDs; removing the contaminated entity identifier from the first and third set of entity identifiers based on the violation of the linking policy; and combining the first set of entity identifiers with the third set of entity identifiers following the removal of the contaminated entity identifier to extract the consistent set of entity identifiers. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer readable storage medium containing instructions for extracting a consistent set of entity identifiers, the instructions comprising functionality to:
-
obtain a first set of entity identifiers comprising a first plurality of groups of entity identifiers using a grouping process; obtain a second set of entity identifiers comprising a second plurality of groups of entity identifiers using the grouping process; subsequent to the grouping process, removing an entity identifier associated with a set violation from the first set of entity identifiers and the second set of entity identifiers, wherein the set violation corresponds to the first and second set of entity identifiers pertaining to a same entity, wherein the first and second set of entity identifiers are not identical and each contain at least two entity identifiers; associate each entity identifier from the first set of entity identifiers with at least one of a first plurality of GroupIDs, wherein each of the first plurality of GroupIDs is an entity identifier designated from among the first set of entity identifiers; associate each entity identifier from the second set of entity identifiers with at least one of a second plurality of GroupIDs, wherein each of the second plurality of GroupIDs is an entity identifier designated from among the second set of entity identifiers; link at least one of the first plurality of GroupIDs having a common entity identifier with at least one of the second plurality of GroupIDs having the common entity identifier; during the linking, identify a contaminated entity identifier associated with the at least one of the first and second plurality of GroupIDs based on a violation of a linking policy, wherein the linking policy is violated when one of the first plurality of GroupIDs is linked to more than one of the third plurality of GroupIDs; remove the contaminated entity identifier from the first and second set of entity identifiers based on the violation of the linking policy; and combine the first set of entity identifiers with the second set of entity identifiers following the removal of the contaminated entity identifier to extract the consistent set of entity identifiers. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
-
-
26. A computer readable storage medium for extracting a consistent set of entity identifiers, the instructions comprising functionality to:
-
obtain a first set of entity identifiers comprising a first plurality of groups of entity identifiers using a grouping process; obtain a second set of entity identifiers comprising a second plurality of groups of entity identifiers using the grouping process; subsequent to the grouping process, removing an entity identifier associated with a set violation from the first set of entity identifiers and the second set of entity identifiers, wherein the set violation corresponds to the first and second set of entity identifiers pertaining to a same entity, wherein the first and second set of entity identifiers are not identical and each contain at least two entity identifiers; associate each entity identifier from the first set of entity identifiers with at least one of a first plurality of GroupIDs following the removal of the entity identifier associated with the set violation, wherein each of the first plurality of GroupIDs is an entity identifier designated from among the first set of entity identifiers; associate each entity identifier from a second set of entity identifiers with at least one of a second plurality of GroupIDs, wherein each of the second plurality of GroupIDs is an entity identifier designated from among the second set of entity identifiers; combine the first set of entity identifiers with the second set of entity identifiers to generate a third set of entity identifiers, wherein each of the third set of entity identifiers is associated with at least one of a third plurality of GroupIDs; link at least one of the third plurality of GroupIDs having a common entity identifier with at least one of the first plurality of GroupIDs having a common entity identifier; during the linking, identify a contaminated entity identifier associated with the at least one of the first and third plurality of GroupIDs based on a violation of a linking policy, wherein the linking policy is violated when one of the first plurality of GroupIDs is linked to more than one of the third plurality of GroupIDs; remove the contaminated entity identifier from the first and third set of entity identifiers based on the violation of the linking policy; and combine the first set of entity identifiers with the third set of entity identifiers following the removal of the contaminated entity identifier to extract the consistent set of entity identifiers. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34)
-
Specification