Filter for checking for duplicate entries in database
First Claim
1. A method of adding records to a database, comprising:
- a) maintaining several databases, each containing records;
b) for each database, using an algorithm to generate a plurality of signatures, one for each record in the database, wherein i) two identical records always cause the algorithm to produce two identical signatures; and
ii) two different records sometimes cause the algorithm to produce two identical signatures;
c) storing the signatures;
d) when a new record is to be added to a target database, i) using the algorithm to generate a new signature for the new record;
ii) comparing the new signature with the stored signatures of the target database;
iii) if the new signature does not match a stored signature of the target database, then adding the new record to the database;
iv) if the new signature does match a stored signature of the target database, then searching records in the target database, and A) if the searched records do not match the new record, then adding the new record to the database; and
B) if the new record matches one of the searched records, calling for human intervention.
2 Assignments
0 Petitions
Accused Products
Abstract
A system for determining whether a record-to-be-added to a database is a duplicate of an existing record. The database is first processed, to generate a library of signatures, one for each record. For example, assume each record contains a phrase. The signature may be a concatenation of the first letters of each word in the phrase. Thus, the signature for “Cats like milk” would be CLM. After generation of the library, when a new record is to be added to the database, a signature is generated for the new record. That signature is compared with the library. In this example, if the new record is “Cats like milk,” and if “CLM” is not found in the library, then it is conclusively known that “Cats like milk” is not present in the database. The new record can be added, without fear of duplication. However, if “CLM” is found in the library, that fact is not dispositive. “CLM” could be present because of the different phrase “Cats like mice” in a record. If such a matching signature is found, then human intervention is called for, to determine whether the new record duplicates an existing record.
195 Citations
12 Claims
-
1. A method of adding records to a database, comprising:
-
a) maintaining several databases, each containing records;
b) for each database, using an algorithm to generate a plurality of signatures, one for each record in the database, wherein i) two identical records always cause the algorithm to produce two identical signatures; and
ii) two different records sometimes cause the algorithm to produce two identical signatures;
c) storing the signatures;
d) when a new record is to be added to a target database, i) using the algorithm to generate a new signature for the new record;
ii) comparing the new signature with the stored signatures of the target database;
iii) if the new signature does not match a stored signature of the target database, then adding the new record to the database;
iv) if the new signature does match a stored signature of the target database, then searching records in the target database, and A) if the searched records do not match the new record, then adding the new record to the database; and
B) if the new record matches one of the searched records, calling for human intervention. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
i) the new record has a group of one or more new key values, and ii) the records searched in paragraph (d)(iv) are limited to records in the target database having key values which are common to the new key values. -
8. Method according to claim 1, wherein the signature-comparison processes of paragraph (d) rely on one or more Bloom Filters to determine whether the new signature matches a signature of a record in the target database.
-
-
9. A method of adding records to a database, comprising:
-
a) maintaining several databases, i) each database corresponding to a respective bank, or bank branch, and ii) each database containing financial records of bank customers;
b) for each database, using an algorithm to generate a plurality of signatures, one for each record in the database, wherein i) two identical records always cause the algorithm to produce two identical signatures; and
ii) two different records sometimes cause the algorithm to produce two identical signatures;
c) storing the signatures;
d) when a new record is to be added to a target database, i) using the algorithm to generate a new signature for the new record;
ii) comparing the new signature with the stored signatures of the target database;
iii) if the new signature does not match a stored signature of the target database, then adding the new record to the database;
iv) if the new signature does match a stored signature of the target database, then searching records in the target database, and A) if the searched records do not match the new record, then adding the new record to the database; and
B) if the new record matches one of the searched records, calling for human intervention. - View Dependent Claims (10, 11, 12)
i) the new record has a group of one or more new key values, and ii) the records searched in paragraph (d)(iv) are limited to records in the target database having key values which are common to the new key values. -
12. Method according to claim 9, wherein the signature-comparison processes of paragraph (d) rely on one or more Bloom Filters to determine whether the new signature matches a signature of a record in the target database.
-
Specification