Method and system for managing data quality for Spanish names and addresses in a database

US 10,275,450 B2
Filed: 09/20/2016
Issued: 04/30/2019
Est. Priority Date: 02/15/2016
Status: Active Grant

First Claim

Patent Images

1. A system for identifying and matching a plurality of similar Spanish names and addresses for a given set of data, the system comprising:

a hardware processor;

a memory for enabling the system for managing data quality of Spanish names and addresses in a database, wherein the memory comprises;

a parsing engine to receive the plurality of Spanish names and addresses from the database, wherein the parsing engine generates a set of parsed Spanish names and addresses;

a probable identification engine receiving the set of parsed Spanish names and addresses as an input and generating probable matches of plurality of Spanish names and addresses;

a match percentage calculation engine to calculate percentage match for the generated probable matches of plurality of Spanish names and addresses; and

the database for storing one or more matched Spanish names and addresses,wherein the match percentage calculation engine further comprises a name match percentage calculation engine and an address match percentage calculation engine, wherein the name match percentage calculation engine calculates matching percentage between two probable matches using a NameKdiff algorithm and the address match percentage calculation engine calculates matching percentage between two probable matches using an AddressKdiff algorithm,wherein the memory, the database, the hardware processor, and the parsing engine, the probable identification engine and the match percentage calculation engine in the memory are coupled to a system bus.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system to identify similar names and addresses from given data set comprising plurality of names and addresses. The invention more specifically addresses the challenge faced in Spanish data quality assurance. The name and data is parsed through parsing engine to parse the plurality of Spanish names and addresses. The parsed Spanish names and addresses are sent to a Probable identification engine to identify the probable matches. The combination of name and address matching process can be used for assuring data quality for Spanish names and addresses. The Spanish name matching process consists of identification of probable matches and finding similarity percentages between those probable. Similarly, the Spanish address matching process consists of identification of probable matches (criteria like same city) and finding similarity percentages between those probable. The system includes a parsing engine, a probable identification engine and a match percentage calculation engine.

16 Citations

View as Search Results

3 Claims

1. A system for identifying and matching a plurality of similar Spanish names and addresses for a given set of data, the system comprising:
- a hardware processor;
  
  a memory for enabling the system for managing data quality of Spanish names and addresses in a database, wherein the memory comprises;
  
  a parsing engine to receive the plurality of Spanish names and addresses from the database, wherein the parsing engine generates a set of parsed Spanish names and addresses;
  
  a probable identification engine receiving the set of parsed Spanish names and addresses as an input and generating probable matches of plurality of Spanish names and addresses;
  
  a match percentage calculation engine to calculate percentage match for the generated probable matches of plurality of Spanish names and addresses; and
  
  the database for storing one or more matched Spanish names and addresses,wherein the match percentage calculation engine further comprises a name match percentage calculation engine and an address match percentage calculation engine, wherein the name match percentage calculation engine calculates matching percentage between two probable matches using a NameKdiff algorithm and the address match percentage calculation engine calculates matching percentage between two probable matches using an AddressKdiff algorithm,wherein the memory, the database, the hardware processor, and the parsing engine, the probable identification engine and the match percentage calculation engine in the memory are coupled to a system bus.
- View Dependent Claims (2, 3)
- - 2. The system of claim 1, wherein the parsing engine further comprises a name parsing engine and an address parsing engine, wherein the name parsing engine parses one or more First Names, one or more of Surname1 and Surname2 and the address parsing engine parses an address to separate a numeric and a string part.
  - 3. The system of claim 1, wherein the probable identification engine further comprises a probable names identification engine and a probable address identification engine, wherein the names identification engine generates Soundex codes for one or more First Names, one or more of Surname1 and Surname2 and identifies probable matches for each of the First names based on generated Soundex codes, and the probable address identification engine identifies an address satisfying one or more criteria for probable matches.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
TATA Consultancy Services Limited (Tata Sons Pvt Ltd.)
Original Assignee
TATA Consultancy Services Limited (Tata Sons Pvt Ltd.)
Inventors
Diwan, Ashish, Solanki, Nandish Kirtikumar, Pattar, Sridhar G., Kumar, Sudhir
Primary Examiner(s)
Yen, Eric

Application Number

US15/271,139
Publication Number

US 20170262426A1
Time in Patent Office

952 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/334   Query execution G06F16/335 ...

G06F 16/43   Querying

G06F 40/205   Parsing

G06F 40/242   Dictionaries

G06F 40/268   Morphological analysis

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/295   Named entity recognition

Method and system for managing data quality for Spanish names and addresses in a database

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

16 Citations

3 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for managing data quality for Spanish names and addresses in a database

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

16 Citations

3 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links