×

Managing an archive for approximate string matching

  • US 8,775,441 B2
  • Filed: 01/16/2008
  • Issued: 07/08/2014
  • Est. Priority Date: 01/16/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method for managing an archive for determining approximate matches associated with strings occurring in records, the method including:

  • determining a set of strings occurring in the records, the set of strings including a first string;

    generating, for each of the strings in the set, a plurality of deletion variants that are each generated by deleting one or more characters from the corresponding string;

    for the first string, identifying one or more potentially matching strings in the set of strings, each potentially matching string of the potentially matching strings identified in response to determining that any deletion variant of the first string matches any deletion variant of the potentially matching string;

    for each of the potentially matching strings, calculating a corresponding match score;

    for at least some of the potentially matching strings, storing a record in the archive identifying the first string, the potentially matching string, and the match score;

    determining a count of occurrences of the first string in the records;

    for each of the potentially matching strings, determining a count of occurrences of the respective potentially matching string in the records; and

    generating a significance value for the first string based on a sum of at least the count of occurrences of the string and the count of occurrences of each of the one or more potentially matching strings.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×