System of and method for entity representation splitting without the need for human interaction
First Claim
1. A computer-implemented process for delinking, based on a bloat index formula, entity representations in an electronic database associated with a population of individuals, the electronic database stored at least partially in a memory and comprising a plurality of entity representations, each entity representation comprising a plurality of linked electronic records that likely refer to a same individual of the population of individuals, each electronic record comprising a plurality of fields, each field capable of containing a field value, the process comprising:
- calculating a field inconsistency weight for a plurality of fields in the electronic database, wherein each field inconsistency weight is derived from a field inconsistency probability associated with the corresponding field and each field inconsistency probability reflects a likelihood that an arbitrary entity representation in the electronic database includes records with different field values in the corresponding field;
selecting an entity representation in the electronic database;
calculating, for the selected entity representation, a bloat index reflecting a sum of field inconsistency weights over a plurality of fields common to a plurality of linked electronic records of the selected entity representation;
responsive to a field or record being added to the electronic database, determining, based on the bloat index and a known or expected size of the population of individuals associated with the electronic database, whether there is a sufficiently high confidence level that the plurality of linked electronic records of the selected entity representation do not correspond to the respective same individual; and
delinking, by the processor, in the electronic database, each of the plurality of linked electronic records of the selected entity representation based on the determining;
wherein an individual is at least one of a natural person, a body of work, an institution, and a company.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed is a system for, and method of, determining whether records and entity representations should be delinked. The system and method need no human interaction in order to calculate parameters and utilizing formulas used for the delinking decisions.
-
Citations
15 Claims
-
1. A computer-implemented process for delinking, based on a bloat index formula, entity representations in an electronic database associated with a population of individuals, the electronic database stored at least partially in a memory and comprising a plurality of entity representations, each entity representation comprising a plurality of linked electronic records that likely refer to a same individual of the population of individuals, each electronic record comprising a plurality of fields, each field capable of containing a field value, the process comprising:
-
calculating a field inconsistency weight for a plurality of fields in the electronic database, wherein each field inconsistency weight is derived from a field inconsistency probability associated with the corresponding field and each field inconsistency probability reflects a likelihood that an arbitrary entity representation in the electronic database includes records with different field values in the corresponding field; selecting an entity representation in the electronic database; calculating, for the selected entity representation, a bloat index reflecting a sum of field inconsistency weights over a plurality of fields common to a plurality of linked electronic records of the selected entity representation; responsive to a field or record being added to the electronic database, determining, based on the bloat index and a known or expected size of the population of individuals associated with the electronic database, whether there is a sufficiently high confidence level that the plurality of linked electronic records of the selected entity representation do not correspond to the respective same individual; and delinking, by the processor, in the electronic database, each of the plurality of linked electronic records of the selected entity representation based on the determining; wherein an individual is at least one of a natural person, a body of work, an institution, and a company. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for delinking, based on a bloat index formula, entity representations in an electronic database representing a population of individuals, the electronic database stored at least partially in a memory and comprising a plurality of entity representations, each entity representation comprising a plurality of linked electronic records that likely refer to a same individual of the population of individuals, each electronic record comprising a plurality of fields, each field capable of containing a field value, the system comprising:
-
a processor; a memory operatively coupled to the at least one processor and configured for storing data and instructions that, when executed by the processor, cause the system to perform a process comprising; calculating a field inconsistency weight for a plurality of fields in the electronic database, wherein each field inconsistency weight is derived from a field inconsistency probability associated with the corresponding field and each field inconsistency probability reflects a likelihood that an arbitrary entity representation in the electronic database includes records with different field values in the corresponding field; selecting an entity representation in the electronic database; calculating, for the selected entity representation, a bloat index reflecting a sum of field inconsistency weights over a plurality of fields common to a plurality of linked electronic records of the selected entity representation; responsive to a field or record being added to the electronic database, determining, based on the bloat index and a known or expected number of individuals in the electronic database, whether there is a sufficiently high confidence level that the plurality of linked electronic records of the selected entity representation do not correspond to the respective same individual; and delinking, by the processor, in the electronic database, each of the plurality of linked electronic records of the selected entity representation based on the determining; wherein an individual is at least one of a natural person, a body of work, an institution, and a company. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
Specification