Automated database blocking and record matching
First Claim
1. A method for dynamically constructing or selecting a query Q from a set of field-value pairs S such that Q will return the maximum number of records from a database D that potentially match S while satisfying a user-defined speed constraint, said method comprising:
- generating blocking sets by evaluating expected record count associated with queries against said predetermined maximum.
4 Assignments
0 Petitions
Accused Products
Abstract
An automated blocking technique is used as a first step to find approximate matches in a database. The technique builds a blocking set to be as liberal as possible in retrieving records that match on individual fields or sets of fields while avoiding selection criteria that are predicted to return more than the maximum number of records defining a particular special requirement. The ability to do blocking without extensive manual setup at low cost is highly advantageous especially when using a machine learning based second-stage matching algorithm.
73 Citations
8 Claims
-
1. A method for dynamically constructing or selecting a query Q from a set of field-value pairs S such that Q will return the maximum number of records from a database D that potentially match S while satisfying a user-defined speed constraint, said method comprising:
generating blocking sets by evaluating expected record count associated with queries against said predetermined maximum.
-
2. The method of 1 where Q is composed of a set of subqueries qi
-
3. The method of 2, where each subquery qi is constructed so that it is estimated to return less than a user-defined maximum number of records c.
-
4. The method of 3, where S is first augmented with zero or more field-value pairs which are functions of S to form S′
- and then each subquery qi selects records in D that match a subset of S′
.
- and then each subquery qi selects records in D that match a subset of S′
-
5. The method of 4 that includes a method for eliminating redundant queries.
-
6. The method of 5 that includes a lookup to a data structure containing counts of a subset of the field-value pairs in D.
-
7. The method of 1 where Q is executed against D and then the records, W, returned by Q are fed to a matching algorithm that assigns to each w in W a decision on whether w matches, does not match, or is a possible match to S.
-
8. The method of 4 where Q is formed by a method for searching the universe of all possible subsets of S′
- .
Specification