×

Document fingerprints using block encoding of text

  • US 8,838,657 B1
  • Filed: 09/07/2012
  • Issued: 09/16/2014
  • Est. Priority Date: 09/07/2012
  • Status: Active Grant
First Claim
Patent Images

1. A system comprising one or more computing devices, the system configured to:

  • detect, within a digitized image object comprising a plurality of text image elements, a plurality of element groups, wherein each element group comprises one or more text image elements and is separated, within the digitized image object, from other element groups of the plurality of element groups by at least one delimiter;

    generate a first representation of the plurality of element groups, each element group being represented within the first representation as a respective two-dimensional block whose size is proportional to the combined size of the text image elements of the element group;

    generate a second representation based at least in part on the first representation, each two-dimensional block of the first representation being represented within the second representation by a respective numerical encoding, a particular numerical encoding used for a particular two-dimensional block being based at least in part on the size of the particular two-dimensional block; and

    store, as a fingerprint representing text contents of the digitized image object, at least a subset of the second representation, the subset of the second representation comprising a plurality of numerical encodings.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×