Method and system for managing data quality for Spanish names and addresses in a database
First Claim
1. A system for identifying and matching a plurality of similar Spanish names and addresses for a given set of data, the system comprising:
- a hardware processor;
a memory for enabling the system for managing data quality of Spanish names and addresses in a database, wherein the memory comprises;
a parsing engine to receive the plurality of Spanish names and addresses from the database, wherein the parsing engine generates a set of parsed Spanish names and addresses;
a probable identification engine receiving the set of parsed Spanish names and addresses as an input and generating probable matches of plurality of Spanish names and addresses;
a match percentage calculation engine to calculate percentage match for the generated probable matches of plurality of Spanish names and addresses; and
the database for storing one or more matched Spanish names and addresses,wherein the match percentage calculation engine further comprises a name match percentage calculation engine and an address match percentage calculation engine, wherein the name match percentage calculation engine calculates matching percentage between two probable matches using a NameKdiff algorithm and the address match percentage calculation engine calculates matching percentage between two probable matches using an AddressKdiff algorithm,wherein the memory, the database, the hardware processor, and the parsing engine, the probable identification engine and the match percentage calculation engine in the memory are coupled to a system bus.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system to identify similar names and addresses from given data set comprising plurality of names and addresses. The invention more specifically addresses the challenge faced in Spanish data quality assurance. The name and data is parsed through parsing engine to parse the plurality of Spanish names and addresses. The parsed Spanish names and addresses are sent to a Probable identification engine to identify the probable matches. The combination of name and address matching process can be used for assuring data quality for Spanish names and addresses. The Spanish name matching process consists of identification of probable matches and finding similarity percentages between those probable. Similarly, the Spanish address matching process consists of identification of probable matches (criteria like same city) and finding similarity percentages between those probable. The system includes a parsing engine, a probable identification engine and a match percentage calculation engine.
16 Citations
3 Claims
-
1. A system for identifying and matching a plurality of similar Spanish names and addresses for a given set of data, the system comprising:
-
a hardware processor; a memory for enabling the system for managing data quality of Spanish names and addresses in a database, wherein the memory comprises; a parsing engine to receive the plurality of Spanish names and addresses from the database, wherein the parsing engine generates a set of parsed Spanish names and addresses; a probable identification engine receiving the set of parsed Spanish names and addresses as an input and generating probable matches of plurality of Spanish names and addresses; a match percentage calculation engine to calculate percentage match for the generated probable matches of plurality of Spanish names and addresses; and the database for storing one or more matched Spanish names and addresses, wherein the match percentage calculation engine further comprises a name match percentage calculation engine and an address match percentage calculation engine, wherein the name match percentage calculation engine calculates matching percentage between two probable matches using a NameKdiff algorithm and the address match percentage calculation engine calculates matching percentage between two probable matches using an AddressKdiff algorithm, wherein the memory, the database, the hardware processor, and the parsing engine, the probable identification engine and the match percentage calculation engine in the memory are coupled to a system bus. - View Dependent Claims (2, 3)
-
Specification