AUTOMATED SELECTION OF GENERIC BLOCKING CRITERIA
First Claim
1. A computer implemented method of identifying a set of fields applicable to partition a plurality of records in an electronic database into one or more blocks based on a desired block size and independent of specific queries against the database, the method comprising:
- receiving a desired block size;
calculating field probabilities for a plurality of fields in the database, wherein each field probability represents an average cohort size for a field, each of the field probabilities associated with one of the fields in the database;
determining a set of fields wherein a product of the associated field probabilities and the number of records in the database is approximately equal to the desired block size; and
outputting the set of fields, the set of fields independent of specific queries against the database.
2 Assignments
0 Petitions
Accused Products
Abstract
Field probabilities associated with fields in a database may be used to create one or more blocking criteria. The blocking criteria may be a set of fields that should be equal among two or more records in a database, so that a search of the records in the database according to the blocking criteria yields a subset of records approximately equal to or less than the specified maximum block size. Generic blocking criteria may also be created. The generic blocking criteria may be used for a batch comparison or batch linking operation within the records of the database.
144 Citations
31 Claims
-
1. A computer implemented method of identifying a set of fields applicable to partition a plurality of records in an electronic database into one or more blocks based on a desired block size and independent of specific queries against the database, the method comprising:
-
receiving a desired block size; calculating field probabilities for a plurality of fields in the database, wherein each field probability represents an average cohort size for a field, each of the field probabilities associated with one of the fields in the database; determining a set of fields wherein a product of the associated field probabilities and the number of records in the database is approximately equal to the desired block size; and outputting the set of fields, the set of fields independent of specific queries against the database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer implemented method of creating blocking criteria based on a desired block size, the method comprising:
-
calculating, using a programmed computer, one or more field probabilities for one or more fields in an electronic database, wherein each field probability represents an average cohort size for a field, each of the field probabilities associated with one of the fields in the database; determining, using a programmed computer, one or more fields wherein a product of the associated field probabilities and a number of records in the database is approximately equal to the desired block size; grouping, using a programmed computer, the one or more fields into one or more blocking criteria; outputting the one or more blocking criteria; and applying, using a programmed computer, at least one of the one or more blocking criteria to the records of the database create a smaller group of records in the database. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A system for identifying a set of fields applicable to partition a plurality of records in an electronic database into one or more blocks based on a desired block size and independent of specific queries against the database, comprising:
-
an electronic processor configured to receive a desired block size; an electronic processor configured to calculate field probabilities for a plurality of fields in the database, wherein each field probability represents an average cohort size for a field, each of the field probabilities associated with one of the fields in the database; an electronic processor configured to determine a set of fields wherein a product of the associated field probabilities and the number of records in the database is approximately equal to the desired block size; and an electronic processor configured to output the set of fields, the set of fields independent of specific queries against the database. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31)
-
Specification