×

Detecting duplicate and near-duplicate files

  • US 20080044016A1
  • Filed: 08/04/2006
  • Published: 02/21/2008
  • Est. Priority Date: 08/04/2006
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for identifying near-duplicate documents, the method comprising:

  • a) accepting a set of documents;

    b) processing the set of documents to determine a first set of near-duplicate documents using a first document similarity technique; and

    c) processing the first set of near duplicate documents to determine a second set of near-duplicate documents using a second document similarity technique.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×