×

Identifying synonyms of entities using a document collection

  • US 8,533,203 B2
  • Filed: 06/04/2009
  • Issued: 09/10/2013
  • Est. Priority Date: 06/04/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method of efficiently selecting synonyms of an entity name, the method comprising:

  • selecting a hit sequence from a document that is stored on a computing device, the hit sequence includes a contiguous string of tokens from a plurality of entity names in an entity name list;

    arranging the tokens of the hit sequence into a suffix tree as linked groups of tokens of the hit sequence, wherein the suffix tree contains a suffix link identifier to(i) identify a discriminating token set (DTS) that is a sub-sequence of the hit sequence,(ii) manage and generate the suffix tree, and(iii) efficiently batch process the hit sequences;

    generating a combination token index from the entity name list identifying a position of each of the tokens and indexes for one or more combinations of each of the tokens;

    determining a discriminating token set map from the combination token index and the suffix tree, the DTS map including a matching of the entity name and the DTS;

    storing a portion of adjacent text surrounding the DTS from the document as a DTS phrase;

    identifying token pairs that are common between the entity name and the DTS phrase associated with the entity name, the token pairs being tokens that are a subset of both the entity name and the DTS phrase;

    generating a score for the DTS based on an occurrence of the token pairs in the DTS phrase,wherein the score is an aggregate score for the DTS across a document collection and the score is generated by counting unique instances of the token pairs and assigning a numerical value to the DTS based on a count of the unique instances of the identified tokens; and

    storing the DTS as a synonym of the entity name on the computing device when the generated score at least reaches the threshold value.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×