Supplier Deduplication Engine
First Claim
1. A method of grouping similar supplier names from a plurality of supplier names together, comprising the steps of:
- correcting syntactical errors in said supplier names;
grouping the supplier names after said step of correcting syntactical errors;
capturing abbreviations of the supplier names;
correcting ordering, pronunciation and stemming errors;
a matching algorithm that matches and compares two of said supplier names further comprising the steps of;
grouping supplier names based on the first set of characters in the supplier names;
calculating a matching score using Levenshtein distance between said two supplier names, along with the supplier names'"'"' sound codes obtained from a modified metaphone algorithm, length of each word, position of matching and mismatching characters, and stem of words in the two supplier names.
5 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein is a method of grouping similar supplier names together in a database. The syntactical errors in the supplier names are corrected. The supplier names are grouped after correcting the syntactical errors. The abbreviations in the supplier names are captured. The ordering, pronunciation and stemming errors in the supplier names are corrected. A matching algorithm that matches and compares two supplier names is applied that comprises the steps of grouping supplier names based on first set of characters in the supplier names and calculating a matching score between the two supplier using Levenshtein distance between the two supplier names, along with the supplier names'"'"' sound codes obtained from a modified metaphone algorithm, length of each word, position of matching and mismatching characters, and stem of words in the supplier names. The matching scores are compared with set thresholds in order to further group the supplier names into clusters.
23 Citations
20 Claims
-
1. A method of grouping similar supplier names from a plurality of supplier names together, comprising the steps of:
-
correcting syntactical errors in said supplier names; grouping the supplier names after said step of correcting syntactical errors; capturing abbreviations of the supplier names; correcting ordering, pronunciation and stemming errors; a matching algorithm that matches and compares two of said supplier names further comprising the steps of; grouping supplier names based on the first set of characters in the supplier names; calculating a matching score using Levenshtein distance between said two supplier names, along with the supplier names'"'"' sound codes obtained from a modified metaphone algorithm, length of each word, position of matching and mismatching characters, and stem of words in the two supplier names. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer program product comprising computer executable instructions embodied in a computer-readable medium, said computer program product including:
-
a first computer parsable program code for correcting syntactical errors in supplier names; a second computer parsable program code for grouping the supplier names after said step of correcting syntactical errors; a third computer parsable program code for capturing abbreviations of the supplier names; a fourth computer parsable program code for correcting ordering, pronunciation and stemming errors; a fifth computer parsable program code of a matching algorithm that matches and compares two of said supplier names further comprising; a sixth computer parsable program code for grouping supplier names based on the first set of characters in the supplier names to avoid unnecessary squared n matching; a seventh computer parsable program code for calculating a matching score using levenshtein distance between said two supplier names, along with the supplier names'"'"' sound codes obtained from a modified metaphone algorithm, length of each word, position of matching and mismatching characters, and stem of words in the two supplier names
-
Specification