Generating rules for matching new customer records to existing customer records in a large database
First Claim
1. A method for generating rules for matching data in a database containing a plurality of records each having a collection of fields, comprising the steps of:
- obtaining a sample of training data from the database;
identifying pairs of records from the sample of training data that are similar;
applying field matching functions to each of the corresponding fields in the similar pairs of records, each field matching function generating a score indicating a strength of a match between items in the field;
generating an intermediate file of vectors containing matching scores for all of the fields from each of the similar pair of records; and
converting the intermediate file of vectors into a plurality of matching rules for matching data in the database.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and system for matching new customer records to existing customer records in a database. The new customer records are validated for quality and normalized into a standard form. A hash key is selected to generate a candidate set of records from the existing records in the database that likely matches the new customer records. The new customer records are then matched to each of the records in the candidate set. Once the matching has been performed, a decision is made on whether to create a new customer record, update an existing record, or save the new record in a pending file for resolution at a later time. In another embodiment, there is a methodology for learning matching rules for matching records in a database. The matching rules are then used for matching a new customer record to existing records in a database.
176 Citations
10 Claims
-
1. A method for generating rules for matching data in a database containing a plurality of records each having a collection of fields, comprising the steps of:
-
obtaining a sample of training data from the database; identifying pairs of records from the sample of training data that are similar; applying field matching functions to each of the corresponding fields in the similar pairs of records, each field matching function generating a score indicating a strength of a match between items in the field; generating an intermediate file of vectors containing matching scores for all of the fields from each of the similar pair of records; and converting the intermediate file of vectors into a plurality of matching rules for matching data in the database. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for generating rules for matching data in a database containing a plurality of records each having a collection of fields, comprising:
-
means for obtaining a sample of training data from the database; means for identifying pairs of records from the sample of training data that are similar; means for applying field matching functions to each of the corresponding fields in the similar pairs of records, each field matching function generating a score indicating a strength of a match between items in the field; means for generating an intermediate file of vectors containing matching scores for all of the fields from each of the similar pair of records; and means for converting the intermediate file of vectors into a plurality of matching rules for matching data in the database. - View Dependent Claims (7, 8, 9, 10)
-
Specification