METHOD AND SYSTEM FOR MANAGING DATA QUALITY FOR SPANISH NAMES AND ADDRESSES IN A DATABASE

US 20170262426A1
Filed: 09/20/2016
Published: 09/14/2017
Est. Priority Date: 02/15/2016
Status: Active Grant

First Claim

Patent Images

1. A method for identification and matching of a plurality of similar Spanish names in a given set of data, the method comprising a processor implemented steps of:

providing a plurality of Spanish names to a name parsing engine (302);

generating a plurality of parsed Spanish names by the name parsing engine (302);

providing the plurality of parsed Spanish names to a probable name identification engine (304);

generating a plurality of Spanish name probable matches by the probable name identification engine (304);

providing the plurality of Spanish name probable matches to a name match percentage calculation engine (306);

calculating a matching percentage between the plurality of Spanish name probable matches by the name match percentage calculation engine (306); and

generating one or more probable matches by the name match percentage calculation engine (306).

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system to identify similar names and addresses from given data set comprising plurality of names and addresses. The invention more specifically addresses the challenge faced in Spanish data quality assurance. The name and data is parsed through parsing engine to parse the plurality of Spanish names and addresses. The parsed Spanish names and addresses are sent to a Probable identification engine to identify the probable matches. The combination of name and address matching process can be used for assuring data quality for Spanish names and addresses. The Spanish name matching process consists of identification of probable matches and finding similarity percentages between those probable. Similarly, the Spanish address matching process consists of identification of probable matches (criteria like same city) and finding similarity percentages between those probable. The system includes a parsing engine, a probable identification engine and a match percentage calculation engine.

15 Citations

View as Search Results

13 Claims

1. A method for identification and matching of a plurality of similar Spanish names in a given set of data, the method comprising a processor implemented steps of:
- providing a plurality of Spanish names to a name parsing engine (302);
  
  generating a plurality of parsed Spanish names by the name parsing engine (302);
  
  providing the plurality of parsed Spanish names to a probable name identification engine (304);
  
  generating a plurality of Spanish name probable matches by the probable name identification engine (304);
  
  providing the plurality of Spanish name probable matches to a name match percentage calculation engine (306);
  
  calculating a matching percentage between the plurality of Spanish name probable matches by the name match percentage calculation engine (306); and
  
  generating one or more probable matches by the name match percentage calculation engine (306).
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein the Spanish names include a first name and a surname, wherein the first name comprises at least one of a simple name or a composite name and the surname comprises at least one of a first surname and a second surname, wherein, the first surname comprises father'"'"'s first surname and the second surname comprise mother'"'"'s surname.
  - 3. The method of claim 1, wherein the probable name identification engine (304) further configured to generate a soundex codes for the parsed Spanish names.
  - 4. The method of claim 1, wherein the name match percentage calculation engine (306) calculates percentage match between the parsed Spanish names by the probable name identification engine, wherein the percentage matches are measured by a nameKdiff alogorthim, wherein the nameKdiff alogorthim receives two parsed Spanish names as input and generate percentage match as output, wherein said nameKdiff alogorthim identifies best percentage match for plurality of parsed Spanish names.

5. A computer implemented method for identification and matching a plurality of similar Spanish addresses for a given set of data, the method comprising:
- providing the plurality of Spanish addresses to an address parsing engine (502);
  
  generating a plurality of parsed Spanish addresses by the address parsing engine (502);
  
  providing the plurality of parsed Spanish addresses to a Probable identification engine (504);
  
  generating a plurality of Spanish addresses probable matches by the Probable address identification engine (504);
  
  providing the plurality of Spanish addresses probable matches to a match percentage calculation engine (506) wherein match percentage calculation engine (506) calculates matching percentage between two probable matches using predefined method; and
  
  generating one or more probable matches by the match percentage calculation engine (506).
- View Dependent Claims (6, 7, 8)
- - 6. The method of claim 5, wherein the address parsing engine (502) further comprises to identify one or more numeric part of the addresses and one or more string part of the addresses, wherein said address parsing engine (502) separates the said numeric part of the addresses and said string part off the addresses to be parsed by the address parsing engine (502).
  - 7. The method of claim 5, wherein the Probable address identification engine (504) receives the string addresses, wherein the Probable address identification engine (504) identifies the probable matches for plurality of string matches in the addresses.
  - 8. The method of claim 5, wherein the Probable address identification engine (504) receives the numeric addresses, wherein the Probable address identification engine (504) identifies the probable matches for plurality of numeric matches in the said addresses.

9. A system for identifying and matching a plurality of similar Spanish names and addresses for a given set of data, the system comprising:
- a parsing engine (112) to receive the plurality of Spanish names and addresses, wherein the parsing engine (112) generates a set of parsed Spanish names and addresses;
  
  a probable identification engine (114) receiving the set of parsed Spanish names and addresses as an input and generating probable matches of plurality of Spanish names and addresses;
  
  a match percentage calculation engine (116) to calculate percentage match for the generated probable matches of plurality of Spanish names and addresses; and
  
  a database for storing one or more matched Spanish names and addresses.
- View Dependent Claims (10, 11, 12)
- - 10. The system of claim 9, wherein the parsing engine (112) further comprises a name parsing engine (302) and an address parsing engine (502), wherein the name parsing engine (302) parses First Name, Surnamel and Surname2 and the address parsing engine (502) parses the address to separate a numeric and a string part of the address.
  - 11. The system of claim 9, wherein the probable identification engine (114) further comprises a probable names identification engine (304) and a probable address identification engine (504), wherein the names identification engine (304) generates Soundex codes for First Name, Surnamel, Surname2 and identifies probable matches for the name based on generated Soundex codes, and the probable address identification engine (504) identifies the address satisfying one or more criteria for probable matches.
  - 12. The system of claim 9, wherein the match percentage calculation engine (116) further comprises a name match percentage calculation engine (306) and an address match percentage calculation engine (506), wherein the name match percentage calculation engine (306) calculates matching percentage between two probable matches using NameKdiff algorithm and the address match percentage calculation engine (506), calculates matching percentage between two probable matches using AddressKdiff algorithm.

13. A non-transitory computer-readable medium having embodied thereon a computer program for identification and matching of a plurality of similar Spanish names in a given set of data, the method comprising:
- providing a plurality of Spanish names to a name parsing engine (302);
  
  generating a plurality of parsed Spanish names by the name parsing engine (302);
  
  providing the plurality of parsed Spanish names to a probable name identification engine (304);
  
  generating a plurality of Spanish name probable matches by the probable name identification engine (304);
  
  providing the plurality of Spanish name probable matches to a name match percentage calculation engine (306);
  
  calculating a matching percentage between the plurality of Spanish name probable matches by the name match percentage calculation engine (306); and
  
  generating one or more probable matches by the name match percentage calculation engine (306).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
TATA Consultancy Services Limited (Tata Sons Pvt Ltd.)
Original Assignee
TATA Consultancy Services Limited (Tata Sons Pvt Ltd.)
Inventors
DIWAN, Ashish, SOLANKI, Nandish Kirtikumar, PATTAR, Sridhar G., KUMAR, Sudhir

Granted Patent

US 10,275,450 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/334   Query execution G06F16/335 ...

G06F 16/43   Querying

G06F 40/205   Parsing

G06F 40/242   Dictionaries

G06F 40/268   Morphological analysis

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/295   Named entity recognition

METHOD AND SYSTEM FOR MANAGING DATA QUALITY FOR SPANISH NAMES AND ADDRESSES IN A DATABASE

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

15 Citations

13 Claims

Specification

Use Cases

Quick Links

Others

METHOD AND SYSTEM FOR MANAGING DATA QUALITY FOR SPANISH NAMES AND ADDRESSES IN A DATABASE

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

15 Citations

13 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others