×

Confidence links between name entities in disparate documents

  • US 8,527,522 B2
  • Filed: 12/29/2008
  • Issued: 09/03/2013
  • Est. Priority Date: 09/05/2008
  • Status: Expired due to Fees
First Claim
Patent Images

1. A system that detects similarities between name strings in a document set, comprising:

  • a processor and a memory, the memory comprising a preprocessing module, a matching module and a generation module;

    the preprocessing module configured to;

    extract a plurality of name strings from the document set by generating additional name strings based on an alternative spelling of one or more name strings in the document set, each name string comprising a similar entity with names that are misspelled, mistranslated, incorrectly transcribed, have multiple aliases, and/or have multiple equally valid spellings, the alternate spelling comprising determining typical misspellings, creating language specific lists of spelling corrections, and generating the alternative spelling based on the spelling corrections;

    the matching module configured to;

    detect possible matching pairs from the plurality of name strings, anddetect a plurality of similarity scores to each of the possible matching pairs using a plurality of algorithms that execute in parallel; and

    the generation module configured to;

    generate a set of equivalent names by its relating name strings from the possible matching pairs based on a comparison between the similarity scores and a threshold.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×