System and methods for searching and matching databases
First Claim
1. A method of correlating input data to stored data, comprising the steps of:
- receiving input data as a plurality of elements;
converting selected elements to a finite family of terms by a first function, such that multiple elements may convert to each term;
matching each term against at least one index of the stored data to obtain a matching term set for each term, wherein each term set is a set of stored match records matching a respective term;
computing record weights for match records existing in the matching term sets; and
applying a second function to match records based on record weights of match records, to determine a match condition of the match records.
1 Assignment
0 Petitions
Accused Products
Abstract
In a database data processing system, input search data is matched against an index of a database to determine database records which either closely or exactly match the input search data. The input search data is broken down into elements, and elements are converted to terms having a finite set of possible values. The Soundex function may be used to convert elements to terms. The terms are compared against an index of terms to determine which database records relate to the input search data. Through statistical analysis, match records are given a record weight which may be used to calculate how closely the input data actually is to each match record. The invention provides a fast and efficient way of accurately searching for data in extremely large databases, while not requiring precise input search data entry. The invention may also be used to compare or supplement one database against another.
-
Citations
48 Claims
-
1. A method of correlating input data to stored data, comprising the steps of:
-
receiving input data as a plurality of elements; converting selected elements to a finite family of terms by a first function, such that multiple elements may convert to each term; matching each term against at least one index of the stored data to obtain a matching term set for each term, wherein each term set is a set of stored match records matching a respective term; computing record weights for match records existing in the matching term sets; and applying a second function to match records based on record weights of match records, to determine a match condition of the match records. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. A system correlating input data to stored data, comprising:
-
means for receiving input data as a plurality of elements; means for converting selected elements to a finite family of terms by a first function, such that multiple elements may convert to each term, wherein the first function is an approximate string matching function which encodes word elements into terms; means for matching each term against at least one index of the stored data to obtain a matching term set for each term, wherein each term set is a set of stored match records matching a respective term; means for computing record weights for match records existing in the matching term sets; and means for applying a second function comprising a plurality of record match tests to certain match records, based on record weights of the certain match records, to determine a match condition of the certain match records. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40)
-
-
41. A database data processing and searching system comprising:
-
a computer system including an input device, a processor, an output device and a storage device; a database of records stored on the storage device and indexed by at least one index having index entries, wherein each index entry contains a Soundex term and a term set referencing records in the database which contain an element of data which converts to the Soundex term of that index entry via a Soundex function; a field mapper executing on the processor for accepting input search data from the input device, and mapping the input search data to one or more record fields compatible with a record format of records in the database; a match engine executing on the processor for converting the input search data in the record fields to Soundex input terms via the Soundex function and for matching each Soundex input term with one of the Soundex terms of an index entry, thus indicating a term set of match records for that Soundex input term; a record weigher executing on the processor for computing record weights for each different match record in the term sets matching the Soundex input terms; a record tester for applying a plurality of record match functions to determine match conditions of certain match records based upon record weights; a second tester for applying an approximate string matching function to the input search data and the data of certain match records from the database in an event that the plurality of record match functions are unable to determine a match condition by a threshold amount.
-
-
42. A method of matching an input database against a reference database comprising:
-
providing a geographically sorted reference database of records comprised of elements; providing a plurality of inverted indexes, each comprising index entries including a phonetically encoded term and a term set indexed by the phonetically encoded term, the term sets containing references to reference database records that contain elements which phonetically encode to the phonetically encoded term of that index entry, and each term set having a term set weight, wherein the term set weight is higher for term sets with a lesser number of references to reference database records; phonetically encoding elements of records of the input database into input terms; selecting a portion of records of the input database, based upon a limiting field of the input data, wherein index entries of an inverted index have term sets referencing a limited number of reference database records, based upon the limiting field; matching the input terms to the phonetically encoded terms of the inverted index entries to determine matching term sets of reference database match records; computing record weights for each unique match record by summing the term set weights of matching term sets having a reference to the match record; applying a plurality of second function record match tests, utilizing the record weights of certain match records, to determine a match condition of certain of the match records, said match condition indicating how closely records of the input data base match to records of the reference database.
-
-
43. A method of correlating input data to stored data, comprising the steps of:
-
receiving input data as a plurality of elements; converting selected elements to a finite family of terms by a first function, such that multiple elements may convert to each term, wherein the first function is an approximate string matching function which encodes word elements into terms; matching each term against at least one index of the stored data to obtain a matching term set for each term, wherein each term set is a set of stored match records matching a respective term; applying a second function to match records based on record weights of the match records, to determine a match condition of the match records. - View Dependent Claims (44)
-
-
45. A computer readable medium encoded with processing logic comprising:
-
means for receiving input data as a plurality of elements; means for converting selected elements to a finite family of terms by a first function, such that multiple elements may convert to each term, wherein the first function is an approximate string matching function which encodes word elements into terms; means for matching each term against at least one index of the stored data to obtain a matching term set for each term, wherein each term set is a set of stored match records matching a respective term; means for computing record weights for match records existing in the matching term sets; and means for applying a second function to match records based on record weights of the match records, to determine a match condition of the match records.
-
-
46. A method of correlating input data to stored data, comprising the steps of:
-
receiving input data as a plurality of elements; converting selected elements to a finite family of terms by a first function, such that multiple elements may convert to each term; matching each term against at least one index of the stored data to obtain a matching term set for each term, wherein each term set is a set of stored match records matching a respective term, the matching step including a limiting step that selects a limited index range, wherein the limited index range designates a portion of records within the stored data used to obtain the matching term set for each term; assigning a term set weight to each matching term set based on records in the matching term set; and computing record weights for match records existing in the matching term sets from the assigned term set weights. - View Dependent Claims (47, 48)
-
Specification