×

DETECTING DUPLICATE AND NEAR-DUPLICATE FILES

  • US 20120078871A1
  • Filed: 12/07/2011
  • Published: 03/29/2012
  • Est. Priority Date: 01/24/2001
  • Status: Abandoned Application
First Claim
Patent Images

1. A method for filtering a plurality of candidate search results to remove near-duplicates, the method comprising:

  • a) for one or more of the plurality of candidate search results, determining that one candidate search result of the one or more candidate search results is a near-duplicate of another of the plurality of candidate search results by1) determining that a cluster identifier of the one candidate search result matches a cluster identifier of the other candidate search result; and

    2) in response to determining that a cluster identifier of the one candidate search result matches a cluster identifier of the other candidate search result, concluding that the one candidate search is a near-duplicate of the other candidate search result; and

    b) in response to the determination that the one candidate search result is a near-duplicate of the other candidate search result, rejecting the one candidate search result thereby defining a filtered set of search results including only those of the plurality of candidate search results that have not been rejected.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×