×

Matching engine with signature generation

  • US 8,171,002 B2
  • Filed: 02/17/2009
  • Issued: 05/01/2012
  • Est. Priority Date: 05/09/2005
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for generating a plurality of signatures associated with a document, the method comprising:

  • receiving a document comprising a plurality of characters;

    normalizing the document to remove non-informative characters from the plurality of characters;

    calculating a score for each informative character of the plurality of characters based on an occurrence frequency and distribution in the document;

    ranking each informative character of the plurality of characters based on the calculated score;

    selecting, from the ranked informative characters, character occurrences; and

    generating a signature for each selected character occurrence,wherein said score for each informative character is proportional to a first quantity divided by a second quantity, further wherein the first quantity comprises a position of a last occurrence of the informative character in the document minus a position of a first occurrence of the informative character in the document, and further wherein the second quantity comprises a square root of a sum of squares of differences in positions between adjacent occurrences of the informative character in the document.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×