×

Matching engine with signature generation

  • US 7,516,130 B2
  • Filed: 02/24/2006
  • Issued: 04/07/2009
  • Est. Priority Date: 05/09/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method for generating a plurality of signatures associated with a document, the method comprising:

  • receiving a document comprising text;

    parsing the document to generate a token set comprising a plurality of tokens, each token corresponding to the text in the document separated by a predefined character characteristic;

    calculating a score for each token in the token set based on a frequency and distribution of occurrences of the text in the document;

    ranking each token in the token set based on the calculated score;

    selecting, from the ranked tokens, a predetermined number of top ranked tokens to limit a number of signatures generated for said document; and

    generating a signature for each occurrence of the selected tokens,wherein said score calculation is such that a more even distribution of the occurrences of the text in the document results in a higher score, andwherein said score for each token is proportional to a first quantity divided by a second quantity, further wherein the first quantity comprises a position of a last occurrence of the text in the document minus a position of a first occurrence of the text in the document, and further wherein the second quantity comprises a square root of a sum of squares of differences in positions between adjacent occurrences of the text in the document.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×