Automated database blocking and record matching
First Claim
1. In a system including a database stored in at least one computer'"'"'s data storage and a data record stored in computer memory, said database comprising plural records, said plural database records comprising data fields, said data record stored in computer memory including data fields, there being a correspondence between at least a subset of data fields of said data record stored in computer memory and at least a subset of the data fields of said records of said database, a method for identifying records in said database which are similar enough to said data record stored in computer memory that they might describe the same person or thing as that described by said data record stored in computer memory, said method comprising:
- a. inputting a value which will be used to limit the number of records similar to said data record stored in computer memory to be identified within said database;
b. creating a set of sets of fields in said data record stored in computer memory, where the ith set of said set of sets of fields is obtained byi. selecting at least one field in said data record, such that the number of records in said database that store the same values as said data record, in all of the corresponding fields of said database is estimated to be fewer than said inputted value, andii. setting the ith set of said set of sets of fields equal to the set of said selected field(s) in said data record;
c. selecting or constructing a database query to retrieve from said database the records which store the same values as said data record stored in computer memory in all the corresponding fields in at least one set of said set of sets of fields, andd. executing said database query to retrieve said retrieved records from said database into computer memory.
4 Assignments
0 Petitions
Accused Products
Abstract
An automated blocking technique is used as a first step to find approximate matches in a database. The technique builds a blocking set to be as liberal as possible in retrieving records that match on individual fields or sets of fields while avoiding selection criteria that are predicted to return more than the maximum number of records defining a particular special requirement. The ability to do blocking without extensive manual setup at low cost is highly advantageous especially when using a machine learning based second-stage matching algorithm.
-
Citations
14 Claims
-
1. In a system including a database stored in at least one computer'"'"'s data storage and a data record stored in computer memory, said database comprising plural records, said plural database records comprising data fields, said data record stored in computer memory including data fields, there being a correspondence between at least a subset of data fields of said data record stored in computer memory and at least a subset of the data fields of said records of said database, a method for identifying records in said database which are similar enough to said data record stored in computer memory that they might describe the same person or thing as that described by said data record stored in computer memory, said method comprising:
-
a. inputting a value which will be used to limit the number of records similar to said data record stored in computer memory to be identified within said database; b. creating a set of sets of fields in said data record stored in computer memory, where the ith set of said set of sets of fields is obtained by i. selecting at least one field in said data record, such that the number of records in said database that store the same values as said data record, in all of the corresponding fields of said database is estimated to be fewer than said inputted value, and ii. setting the ith set of said set of sets of fields equal to the set of said selected field(s) in said data record; c. selecting or constructing a database query to retrieve from said database the records which store the same values as said data record stored in computer memory in all the corresponding fields in at least one set of said set of sets of fields, and d. executing said database query to retrieve said retrieved records from said database into computer memory. - View Dependent Claims (2, 3, 4, 5, 6, 14)
-
-
7. A method for identifying records in a database which are likely matches to a set of at least one field-value pairs, comprising the following steps:
-
a. constructing a query from said set of field-value pairs such that said query will return the maximum number of records from said database while satisfying a user-defined speed constraint, wherein said query is constructed by evaluating the expected record count associated with said query against said predetermined speed constraint; b. executing said query against said database to retrieve a set of records that satisfy said query; and c. passing said retrieved set of records to a matching algorithm which determines, for each record in said retrieved set of records, whether said retrieved record matches said set of at least one field-value pairs, whether said retrieved record does not match said set of at least one field-value pairs, or whether said matching algorithm can not determine whether said retrieved record matches said set of at least one field-value pairs. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
Specification