×

Duplicate document detection

  • US 8,768,940 B2
  • Filed: 09/13/2012
  • Issued: 07/01/2014
  • Est. Priority Date: 02/11/2004
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • generating, using at least one processor, a primary lexicon of attributes and a secondary lexicon of attributes;

    determining unique attributes in a document;

    determining an intersection between the unique attributes in the document and the primary lexicon;

    determining the intersection is below a predetermined threshold;

    based on the intersection being below the predetermined threshold, applying the secondary lexicon to augment the intersection; and

    calculating a set of identifiers for the document based on the augmented intersection.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×