×

Detecting duplicate and near-duplicate files

  • US 7,366,718 B1
  • Filed: 06/27/2003
  • Issued: 04/29/2008
  • Est. Priority Date: 01/24/2001
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for filtering a plurality of candidate search results to remove near-duplicates, the method comprising:

  • a) for one of the plurality of candidate search results, determining whether the one candidate search result is a near-duplicate of another of the plurality of candidate search results by1) comparing a cluster identifier of the one candidate search result with a cluster identifier of the other candidate search result, and2) if the cluster identifiers of the one and the other candidate search results match, then concluding that the one candidate search is a near-duplicate of the other candidate search result; and

    b) in response to a determination that the one candidate search result is a near-duplicate of the other candidate search result, rejecting the one candidate search result thereby defining a filtered set of search results including only those of the plurality of candidate search results that have not been rejected.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×