×

Document near-duplicate detection

  • US 7,962,491 B1
  • Filed: 12/03/2009
  • Issued: 06/14/2011
  • Est. Priority Date: 03/25/2004
  • Status: Active Grant
First Claim
Patent Images

1. A computer device comprising:

  • a memory to store instructions for implementing;

    a fingerprint creation component to generate a fingerprint of a particular length for an input document, where, to generate the fingerprint, the fingerprint creation component is to;

    sample the input document to obtain a plurality of sampled blocks,generate a set of checksum values from the plurality of sampled blocks, where each checksum value, in the set of checksum values, corresponds to an address of a respective one of a plurality of bits of the fingerprint, andgenerate the fingerprint by flipping a particular bit, of the plurality of bits of the fingerprint, a quantity of times based on a quantity of checksum values, in the set of checksum values, that corresponds to the address of the particular bit; and

    a processor to execute the instructions in the memory.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×