System of and method for entity representation splitting without the need for human interaction
First Claim
1. A computer-implemented process for delinking, based on a bloat index formula, entity representations in an electronic database associated with a population of individuals, the electronic database stored at least partially in a memory and comprising a plurality of entity representations, each entity representation comprising a plurality of linked electronic records that likely refer to a same individual of the population of individuals, each electronic record comprising a plurality of fields, each field capable of containing a field value, the process comprising:
- calculating a field inconsistency weight for a plurality of fields in the electronic database, wherein each field inconsistency weight is derived from a field inconsistency probability associated with the corresponding field and each field inconsistency probability reflects a likelihood that an arbitrary entity representation in the electronic database includes records with different field values in the corresponding field;
selecting an entity representation in the electronic database;
calculating, for the selected entity representation, a bloat index reflecting a sum of field inconsistency weights over a plurality of fields common to a plurality of linked electronic records of the selected entity representation;
responsive to a field or record being added to the electronic database, determining, based on the bloat index and a known or expected size of the population of individuals associated with the electronic database, whether there is a sufficiently high confidence level that the plurality of linked electronic records of the selected entity representation do not correspond to the respective same individual; and
delinking, by the processor, in the electronic database, each of the plurality of linked electronic records of the selected entity representation based on the determining;
wherein an individual is at least one of a natural person and company.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed is a system for, and method of, determining whether records and entity representations should be delinked. The system and method need no human interaction in order to calculate parameters and utilizing formulas used for the delinking decisions.
159 Citations
28 Claims
-
1. A computer-implemented process for delinking, based on a bloat index formula, entity representations in an electronic database associated with a population of individuals, the electronic database stored at least partially in a memory and comprising a plurality of entity representations, each entity representation comprising a plurality of linked electronic records that likely refer to a same individual of the population of individuals, each electronic record comprising a plurality of fields, each field capable of containing a field value, the process comprising:
-
calculating a field inconsistency weight for a plurality of fields in the electronic database, wherein each field inconsistency weight is derived from a field inconsistency probability associated with the corresponding field and each field inconsistency probability reflects a likelihood that an arbitrary entity representation in the electronic database includes records with different field values in the corresponding field; selecting an entity representation in the electronic database; calculating, for the selected entity representation, a bloat index reflecting a sum of field inconsistency weights over a plurality of fields common to a plurality of linked electronic records of the selected entity representation; responsive to a field or record being added to the electronic database, determining, based on the bloat index and a known or expected size of the population of individuals associated with the electronic database, whether there is a sufficiently high confidence level that the plurality of linked electronic records of the selected entity representation do not correspond to the respective same individual; and delinking, by the processor, in the electronic database, each of the plurality of linked electronic records of the selected entity representation based on the determining; wherein an individual is at least one of a natural person and company. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for delinking, based on a bloat index formula, entity representations in an electronic database representing a population of individuals, the electronic database stored at least partially in a memory and comprising a plurality of entity representations, each entity representation comprising a plurality of linked electronic records that likely refer to a same individual of the population of individuals, each electronic record comprising a plurality of fields, each field capable of containing a field value, the system comprising:
-
a processor; a memory operatively coupled to the at least one processor and configured for storing data and instructions that, when executed by the processor, cause the system to perform a process comprising; calculating a field inconsistency weight for a plurality of fields in the electronic database, wherein each field inconsistency weight is derived from a field inconsistency probability associated with the corresponding field and each field inconsistency probability reflects a likelihood that an arbitrary entity representation includes records with different field values in the corresponding field; selecting an entity representation in the electronic database; calculating, for the selected entity representation, a bloat index reflecting a sum of field inconsistency weights over a plurality of fields common to a plurality of linked electronic records of the selected entity representation; responsive to a field or record being added to the electronic database, determining based on the bloat index and a known or expected number of individuals in the electronic database, whether there is a sufficiently high confidence level that the plurality of linked electronic records of the selected entity representation do not correspond to the respective same individual; delinking by the processor, in the electronic database, each of the plurality of linked electronic records of the selected entity representation based on the determining; wherein an individual is at least one of a natural person and a company. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A computer-implemented process for delinking, based on cleave points, entity representations in an electronic database associated with a population of individuals, the electronic database stored at least partially in a memory and comprising a plurality of entity representations, each entity representation comprising a plurality of linked electronic records that likely refer to a same individual of the population of individuals, each electronic record comprising a plurality of fields, each field capable of containing a field value, the process comprising:
-
calculating a field inconsistency weight for each of a plurality of fields in the electronic database, wherein each field inconsistency weight is derived from a field inconsistency probability associated with the corresponding field and each field inconsistency probability reflects a likelihood that an arbitrary entity representation in the electronic database includes records with different field values in the corresponding field; selecting a first subset of fields of the plurality of fields, wherein a sum of field inconsistency weights of the selected subset of fields exceeds a threshold wherein the threshold comprises a quantity derived from a threshold probability and the known or expected number of entity representations in the electronic database; identifying an entity representation in the electronic database having inconsistent field values, between two records of the entity representation, in each field of the selected subset of fields; and delinking the identified entity representation along a first cleave point between the two records, wherein two separate entity representations are formed from the corresponding delinking. - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
-
23. A system for delinking, based on cleave points, entity representations in an electronic database associated with a population of individuals, the electronic database stored at least partially in a memory and comprising a plurality of entity representations, each entity representation comprising a plurality of linked electronic records that likely refer to a same individual of the population of individuals, each electronic record comprising a plurality of fields, each field capable of containing a field value, the system comprising:
-
a processor; a memory operatively coupled to the at least one processor and configured for storing data and instructions that, when executed by the processor, cause the system to perform a process comprising; calculating a field inconsistency weight for each of a plurality of fields in the electronic database, wherein each field inconsistency weight is derived from a field inconsistency probability associated with the corresponding field and each field inconsistency probability reflects a likelihood that an arbitrary entity representation in the electronic database includes records with different field values in the corresponding field; selecting a first subset of fields of the of the plurality of fields, wherein a sum of field inconsistency weights of the selected subset of fields exceeds a threshold, wherein the threshold comprises a quantity derived from a threshold probability and a known or expected size of the population of individuals associated with the electronic database; identifying an entity representation in in the electronic database having inconsistent field values, between two records of the entity representation, in each field in the selected subset of fields; and delinking the identified entity representation along a first cleave point between the two records, wherein two separate entity representations are formed from the corresponding delinking. - View Dependent Claims (24, 25, 26, 27, 28)
-
Specification