METHOD AND SYSTEM FOR MANAGING DATA QUALITY FOR SPANISH NAMES AND ADDRESSES IN A DATABASE
First Claim
1. A method for identification and matching of a plurality of similar Spanish names in a given set of data, the method comprising a processor implemented steps of:
- providing a plurality of Spanish names to a name parsing engine (302);
generating a plurality of parsed Spanish names by the name parsing engine (302);
providing the plurality of parsed Spanish names to a probable name identification engine (304);
generating a plurality of Spanish name probable matches by the probable name identification engine (304);
providing the plurality of Spanish name probable matches to a name match percentage calculation engine (306);
calculating a matching percentage between the plurality of Spanish name probable matches by the name match percentage calculation engine (306); and
generating one or more probable matches by the name match percentage calculation engine (306).
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system to identify similar names and addresses from given data set comprising plurality of names and addresses. The invention more specifically addresses the challenge faced in Spanish data quality assurance. The name and data is parsed through parsing engine to parse the plurality of Spanish names and addresses. The parsed Spanish names and addresses are sent to a Probable identification engine to identify the probable matches. The combination of name and address matching process can be used for assuring data quality for Spanish names and addresses. The Spanish name matching process consists of identification of probable matches and finding similarity percentages between those probable. Similarly, the Spanish address matching process consists of identification of probable matches (criteria like same city) and finding similarity percentages between those probable. The system includes a parsing engine, a probable identification engine and a match percentage calculation engine.
15 Citations
13 Claims
-
1. A method for identification and matching of a plurality of similar Spanish names in a given set of data, the method comprising a processor implemented steps of:
-
providing a plurality of Spanish names to a name parsing engine (302); generating a plurality of parsed Spanish names by the name parsing engine (302); providing the plurality of parsed Spanish names to a probable name identification engine (304); generating a plurality of Spanish name probable matches by the probable name identification engine (304); providing the plurality of Spanish name probable matches to a name match percentage calculation engine (306); calculating a matching percentage between the plurality of Spanish name probable matches by the name match percentage calculation engine (306); and generating one or more probable matches by the name match percentage calculation engine (306). - View Dependent Claims (2, 3, 4)
-
-
5. A computer implemented method for identification and matching a plurality of similar Spanish addresses for a given set of data, the method comprising:
-
providing the plurality of Spanish addresses to an address parsing engine (502); generating a plurality of parsed Spanish addresses by the address parsing engine (502); providing the plurality of parsed Spanish addresses to a Probable identification engine (504); generating a plurality of Spanish addresses probable matches by the Probable address identification engine (504); providing the plurality of Spanish addresses probable matches to a match percentage calculation engine (506) wherein match percentage calculation engine (506) calculates matching percentage between two probable matches using predefined method; and generating one or more probable matches by the match percentage calculation engine (506). - View Dependent Claims (6, 7, 8)
-
-
9. A system for identifying and matching a plurality of similar Spanish names and addresses for a given set of data, the system comprising:
-
a parsing engine (112) to receive the plurality of Spanish names and addresses, wherein the parsing engine (112) generates a set of parsed Spanish names and addresses; a probable identification engine (114) receiving the set of parsed Spanish names and addresses as an input and generating probable matches of plurality of Spanish names and addresses; a match percentage calculation engine (116) to calculate percentage match for the generated probable matches of plurality of Spanish names and addresses; and a database for storing one or more matched Spanish names and addresses. - View Dependent Claims (10, 11, 12)
-
-
13. A non-transitory computer-readable medium having embodied thereon a computer program for identification and matching of a plurality of similar Spanish names in a given set of data, the method comprising:
-
providing a plurality of Spanish names to a name parsing engine (302); generating a plurality of parsed Spanish names by the name parsing engine (302); providing the plurality of parsed Spanish names to a probable name identification engine (304); generating a plurality of Spanish name probable matches by the probable name identification engine (304); providing the plurality of Spanish name probable matches to a name match percentage calculation engine (306); calculating a matching percentage between the plurality of Spanish name probable matches by the name match percentage calculation engine (306); and generating one or more probable matches by the name match percentage calculation engine (306).
-
Specification