×

Detecting duplicate and near-duplicate files

  • US 6,658,423 B1
  • Filed: 01/24/2001
  • Issued: 12/02/2003
  • Est. Priority Date: 01/24/2001
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for determining whether documents, in a large collection of documents, are near-duplicates, the method comprising:

  • a) for each of at least some of the documents in the large collection of documents, generating at least two fingerprints;

    b) preprocessing the fingerprints to identify any fingerprints that are associated with only one document; and

    c) determining whether or not documents are near-duplicate documents based on fingerprints other than those identified as being associated with only one document.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×