×

Parsing information in data records and in different languages

  • US 8,321,393 B2
  • Filed: 12/31/2007
  • Issued: 11/27/2012
  • Est. Priority Date: 03/29/2007
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method for comparing a first data record and a second data record, wherein the first and second data records are located in one or more data sources, the first data record comprises a first attribute and the second data record comprises a second attribute, the method comprising:

  • parsing the first and second attributes to produce a set of tokens for each of those attributes, wherein the data sources employ at least two different languages and at least one of the first and second attributes is expressed in a language employing other than a Latin alphabet;

    calculating an average information score for the first attribute and the second attribute, wherein the average information score is calculated based upon a matching of tokens for each of the first and second attributes;

    generating a weight for the first attribute and the second attribute; and

    normalizing the weight based on the average information score;

    wherein generating the weight comprises comparing each of a set of tokens of the first attribute to each of a set of tokens of the second attribute such that pairs of tokens are compared, and comparing each pair of tokens comprises;

    determining a current match weight for a pair of tokens;

    determining a first previous match weight corresponding to the pair of tokens;

    determining a second previous match weight corresponding to the pair of tokens;

    setting the weight to the current match weight in response to the current match weight being greater than the first previous match weight or the second previous match weight; and

    setting the weight to the greater of the first previous match weight or the second previous match weight in response to either the first previous match weight or the second previous match weight being greater than the current match weight; and

    linking the first data record and the second data record based on the normalized weight between the two attributes.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×