System, method and computer program product for matching textual strings using language-biased normalisation, phonetic representation and correlation functions
First Claim
1. A method of comparing a first term to a second term, comprising the steps of:
- (1) normalizing said first term and said second term; and
(2) comparing said first term with said second term to determine whether they match, comprising one or more of;
(a) comparing said first term with said second term using an exact match algorithm;
(b) comparing said first term with said second term using an any-order matching algorithm;
(c) comparing said first term with said second term using phonetic transformation and matching;
(d) comparing said first term with said second term using synonym substitution; and
(e) comparing said first term with said second term using probabilistic matching using novel string-correlation techniques.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system and computer program product for transformation, normalization and correlation techniques that are effective for matching names of foreign origin that may be spelt in any number of ways. It addresses the problem of matching names that may belong to the same person but may be spelt differently. The main technique is to convert both strings to be matched into a representation of their original language, i.e., transform them into idealized (normalized) versions of themselves based on their true spelling in their original, native language. This process of idealization can be done either by employing a dictionary of standard, idealized names, or by implementing the idealization in real time by following a finite-state algorithm to convert the strings into their true representation in their original language. The idealization process can be viewed as a phonetic searching method, as it resolves the problem of vowel representations or their incorrect use as well as handling the representation of consonants that do not exist in the English language. Further probabilistic and elastic matching techniques, using a correlation function, can be invoked manually or automatically to match names where the quality of or the completeness of names may be suspect. A new approach to “probabilistic” and “sliding-elastic” matching (which give a level of confidence as a percentage against each match) can be used with or without the phonetic (idealized) searching function. The results of the search are displayed on the computer screen or printed, showing all the successful matches, together with the type of search that has been used to obtain the match. Results can be filtered by comparing attributes of the persons associated with the Suspect and Data names (such as age, country of birth, etc.) to minimize reporting on irrelevant matches.
-
Citations
1 Claim
-
1. A method of comparing a first term to a second term, comprising the steps of:
-
(1) normalizing said first term and said second term; and
(2) comparing said first term with said second term to determine whether they match, comprising one or more of;
(a) comparing said first term with said second term using an exact match algorithm;
(b) comparing said first term with said second term using an any-order matching algorithm;
(c) comparing said first term with said second term using phonetic transformation and matching;
(d) comparing said first term with said second term using synonym substitution; and
(e) comparing said first term with said second term using probabilistic matching using novel string-correlation techniques.
-
Specification