×

System and method for performing configurable matching of similar data in a data repository

  • US 20070276844A1
  • Filed: 05/01/2006
  • Published: 11/29/2007
  • Est. Priority Date: 05/01/2006
  • Status: Active Grant
First Claim
Patent Images

1. A computer program product for adaptive matching of similar data in a data repository comprising:

  • a computer usable memory medium having computer readable program code embodied therein wherein said computer readable program code comprises a matching executable unit configured to;

    present at least one field common to a first record and a second record wherein said at least one field is used to perform a match between said first record and said second record and wherein said at least one field is presented to a user;

    obtain a first selected field and a second selected field from said at least one field wherein said first selected field and said second selected field is obtained from said user;

    tokenize a first data entry in said first selected field for a first record to produce a first tokenized data entry;

    tokenize a second data entry in said second selected field for said second record to produce a second tokenized data entry;

    exclude at least one character from said first tokenized data entry for utilization in a match that involves said first field and said second field;

    exclude at least one different character with respect to said at least one character from said second tokenized data entry for utilization in a match that involves said first field and said second field;

    remove frequently used strings from said first tokenized data entry and from said second tokenized data entry;

    normalize data from said first field and from said second field to cleanse strings;

    accept a first list of tokens desired for a match to occur utilizing said first selected field;

    accept a second list of tokens desired for a match to occur utilizing said second selected field;

    assign weights to each token in said first list of tokens and each token in said second list of tokens;

    calculate a score for a match through summation of said weights for each token that occurs in said first tokenized data entry and second record and for each token that occurs in said second tokenized data entry and said second record;

    generate a group of similar records when said score is above a threshold; and

    , display said group of similar records.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×