System and method for cleansing, linking and appending data records of a database
First Claim
1. A method for creating a cleansed output file containing a plurality of business data records from a single pass through an input file, comprising the steps of:
- (a) selecting an input file containing a plurality of data records;
(b) selecting a reference file, said reference file containing a plurality of data records;
(c) computing a search key; and
(d) for each said data record in said input file;
(i) retrieving said data record from said input file on remote storage;
(ii) searching said reference file with a matcher process for all said data records in said reference file that match said search key and reading each said data record from said reference file that matches said search key, thereby generating a candidate data record list;
(iii) searching said candidate data record list and determining a matching data record, wherein said matching data record matches said data record in said input file;
(iv) creating a new cleansed data record;
(v) cleansing said data record of said input file according to said matching data record, thereby generating verified information;
(vi) writing said verified information into said new cleansed data record; and
(vii) writing said new cleansed data record to a cleansed output file;
wherein said steps (d)(i) through (d)(vii) are performed in a single pass through said data records of said input file and in a single pass through said reference file, such that each data record of said input file is read from a remote storage location only once, each said matching data record of said reference file is read from a remote storage location only once, and each said new data record to said cleansed output file is written to a remote storage location only once.
2 Assignments
0 Petitions
Reexamination
Accused Products
Abstract
A system and method for reading a data record from an input file only once, processing that data record according to one or more reference files, and then writing out the cleansed and updated data record to a target file such that the data record is read and written to remote storage only once, thereby making a single pass through a given database of data records. Each data record (comprising of multiple data elements) of the input file is reviewed, verified, and corrected against one or more reference databases containing similar information, assigned a unique identifying key, and, optionally, appended with new additional data elements of a matching data record from a new-data database.
60 Citations
19 Claims
-
1. A method for creating a cleansed output file containing a plurality of business data records from a single pass through an input file, comprising the steps of:
-
(a) selecting an input file containing a plurality of data records; (b) selecting a reference file, said reference file containing a plurality of data records; (c) computing a search key; and (d) for each said data record in said input file; (i) retrieving said data record from said input file on remote storage; (ii) searching said reference file with a matcher process for all said data records in said reference file that match said search key and reading each said data record from said reference file that matches said search key, thereby generating a candidate data record list; (iii) searching said candidate data record list and determining a matching data record, wherein said matching data record matches said data record in said input file; (iv) creating a new cleansed data record; (v) cleansing said data record of said input file according to said matching data record, thereby generating verified information; (vi) writing said verified information into said new cleansed data record; and (vii) writing said new cleansed data record to a cleansed output file; wherein said steps (d)(i) through (d)(vii) are performed in a single pass through said data records of said input file and in a single pass through said reference file, such that each data record of said input file is read from a remote storage location only once, each said matching data record of said reference file is read from a remote storage location only once, and each said new data record to said cleansed output file is written to a remote storage location only once. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
Specification