×

Document near-duplicate detection

  • US 8,364,686 B1
  • Filed: 05/27/2011
  • Issued: 01/29/2013
  • Est. Priority Date: 03/25/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method performed by one or more computer devices, the method comprising:

  • generating, by at least one of the one or more computer devices, a fingerprint for an input document, where generating the fingerprint includes;

    sampling the input document to obtain a plurality of sampled blocks,generating a set of checksum values from the plurality of sampled blocks, where each checksum value, in the set of checksum values, corresponds to an address of a respective one of a plurality of bits of the fingerprint, andgenerating the fingerprint by flipping a particular bit, of the plurality of bits of the fingerprint, a quantity of times based on a quantity of checksum values, in the set of checksum values, that corresponds to the address of the particular bit; and

    storing, by at least one of the one or more computing devices, the fingerprint in a memory.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×