×

Supplier deduplication engine

  • US 8,234,107 B2
  • Filed: 02/12/2008
  • Issued: 07/31/2012
  • Est. Priority Date: 05/03/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method for deduplication and grouping similar supplier names from a plurality of supplier names, comprising:

  • correcting syntactical errors in said supplier names;

    grouping the supplier names after said step of correcting syntactical errors;

    capturing abbreviations of the supplier names;

    correcting ordering, pronunciation and stemming errors in the supplier names;

    calculating a name matching score between two of said supplier names using a matching algorithm, comprising the steps of;

    grouping supplier names based on the first set of characters in the supplier names;

    calculating a word matching score between corresponding words in two of said supplier names, comprising;

    determining stems of said corresponding words;

    determining sound codes of said determined stems using a modified metaphone algorithm;

    determining a Levenshtein distance between said sound codes;

    calculating a prefix score using said stems and calculating a sound score using said sound codes;

    calculating a Levenshtein distance score using said determined Levenshtein distance and length of larger of said corresponding words;

    selecting one of said prefix score, said sound score and said Levenshtein distance score as said word matching score based on comparisons with a set threshold;

    calculating said name matching score between two of said supplier names based on said word matching score; and

    comparing said name matching score with a threshold value to determine a match, and grouping said supplier names based on said match.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×