Method and system for minimizing attribute naming errors in set oriented duplicate detection
First Claim
1. A method of detecting duplicate entries in an address file, comprising the steps of:
- (a) entering an address list to an addressing system, wherein said address list is comprised of one or more address records and said address records are comprised of one or more address fields;
(b) applying a nickname lookup table to said address records, wherein said nickname lookup table comprises one or more nicknames corresponding to a common first name, said one or more nicknames located in one of said address fields; and
further comprising the step of selecting the degree of precision to which a match sequence can be subjected;
(c) performing said match sequence by matching a first record from said address list with a second record and subsequent records, if any, from said address list by comparing said one or more address fields of said first record with said one or more address fields of said second or subsequent records;
(d) repeating said match sequence for each of said subsequent records;
(e) determining a duplicate set, wherein said duplicate set is comprised of all address records with address fields that match as determined by a set of pre-selected criteria;
(f) listing said duplicate set so that each address record follows sequentially;
(g) determining an address record to be retained within said address list; and
(h) retaining said address record within said address list; and
placing said duplicate set on a second list.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention is a method for detecting duplicate records on a list or in a file and comprises a number of steps. The steps include entering a list, comprised of one or more records, to a data processing system; then, applying a nickname lookup table to the records to determine a common first name. Once a common name has been determined, the method matches a first record from the list with a second record from the list by comparing the fields of the first record with the fields of at least one other record; the comparison is based on a set of pre-determined criteria. The matching sequence determines a duplicate set, wherein the duplicate set is comprised of at least two records with fields that match. The method then lists matching records sequentially so that the system can create a new record by filling each empty field with a next available corresponding field from a subsequent record within the duplicate set. The newly created record is then retained on the original list; and the duplicate records are placed on a second list. Pre-sorting of the list can occur just prior to the matching sequence as well as just prior to outputting the final list. Additionally, the system operator can be given a number of options to provide flexibility. These options can include: manually correcting a record on the duplicate records list; deleting an address record from the list of duplicates; or, outputting the record.
82 Citations
16 Claims
-
1. A method of detecting duplicate entries in an address file, comprising the steps of:
-
(a) entering an address list to an addressing system, wherein said address list is comprised of one or more address records and said address records are comprised of one or more address fields; (b) applying a nickname lookup table to said address records, wherein said nickname lookup table comprises one or more nicknames corresponding to a common first name, said one or more nicknames located in one of said address fields; and
further comprising the step of selecting the degree of precision to which a match sequence can be subjected;(c) performing said match sequence by matching a first record from said address list with a second record and subsequent records, if any, from said address list by comparing said one or more address fields of said first record with said one or more address fields of said second or subsequent records; (d) repeating said match sequence for each of said subsequent records; (e) determining a duplicate set, wherein said duplicate set is comprised of all address records with address fields that match as determined by a set of pre-selected criteria; (f) listing said duplicate set so that each address record follows sequentially; (g) determining an address record to be retained within said address list; and (h) retaining said address record within said address list; and
placing said duplicate set on a second list. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. An addressing system for detecting duplicate entries in an address file, comprising:
-
a. means for entering an address list, wherein said address list is comprised of one or more address records and said address records are comprised of one or more address fields; b. means for applying a nickname lookup table to said address records, wherein said nickname lookup table comprises one or more nicknames corresponding to a common first name, said one or more nicknames located in one of said address fields;
said applying means further comprising means for selecting the degree of precision to which a match sequence can be subjected;c. means for performing said match sequence for each of said address records by matching a first record from said address list with a second record and subsequent records, if any, from said address list by comparing said one or more address fields of said first record with said one or more address fields of said second or subsequent records, and repeating said match sequence for each of said subsequent records; d. means for determining a duplicate set, wherein said duplicate set is comprised of all address records with address fields that match as determined by a set of pre-selected criteria; e. means for listing said duplicate set so that each address record follows sequentially; f. means for determining an address record to be retained within said address list; and g. means for retaining said address record within said address list; and placing said duplicate set on a second list.
-
Specification