Document fingerprints using block encoding of text
First Claim
1. A system comprising one or more computing devices, the system configured to:
- detect, within a digitized image object comprising a plurality of text image elements, a plurality of element groups, wherein each element group comprises one or more text image elements and is separated, within the digitized image object, from other element groups of the plurality of element groups by at least one delimiter;
generate a first representation of the plurality of element groups, each element group being represented within the first representation as a respective two-dimensional block whose size is proportional to the combined size of the text image elements of the element group;
generate a second representation based at least in part on the first representation, each two-dimensional block of the first representation being represented within the second representation by a respective numerical encoding, a particular numerical encoding used for a particular two-dimensional block being based at least in part on the size of the particular two-dimensional block; and
store, as a fingerprint representing text contents of the digitized image object, at least a subset of the second representation, the subset of the second representation comprising a plurality of numerical encodings.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and apparatus for document encoding using block encoding of text are disclosed. A computing device is configured to detect, within a digitized image object, a plurality of element groups, where each group comprises one or more text image elements and is separated from other groups by at least one delimiter. The device generates a numerical representation of the groups, comprising a plurality of numerical values, where a particular value corresponding to a particular group is determined based at least in part on a combined size of text image elements of the particular group. The device stores at least a subset of the numerical representation as a fingerprint representing text contents of the digitized image object.
36 Citations
30 Claims
-
1. A system comprising one or more computing devices, the system configured to:
-
detect, within a digitized image object comprising a plurality of text image elements, a plurality of element groups, wherein each element group comprises one or more text image elements and is separated, within the digitized image object, from other element groups of the plurality of element groups by at least one delimiter; generate a first representation of the plurality of element groups, each element group being represented within the first representation as a respective two-dimensional block whose size is proportional to the combined size of the text image elements of the element group; generate a second representation based at least in part on the first representation, each two-dimensional block of the first representation being represented within the second representation by a respective numerical encoding, a particular numerical encoding used for a particular two-dimensional block being based at least in part on the size of the particular two-dimensional block; and store, as a fingerprint representing text contents of the digitized image object, at least a subset of the second representation, the subset of the second representation comprising a plurality of numerical encodings. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method comprising:
under control of one or more computing devices configured with specific computer-executable instructions, detecting, within a digitized image object comprising a plurality of text image elements, a plurality of element groups, wherein each element group comprises one or more text image elements and is separated, within the digitized image object, from other element groups of the plurality of element groups by at least one delimiter; programmatically generating a numerical representation of the plurality of element groups, comprising a respective plurality of numerical values, wherein a particular numerical value corresponding to a particular element group is selected based at least in part on a combined size of text image elements of the particular element group; and storing, in an electronic memory, as a fingerprint representing text contents of the digitized image object, at least a subset of the numerical representation. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15)
-
16. A non-transitory computer-readable storage medium storing program instructions that when executed on one or more processors:
-
detect, within a digitized image object comprising a plurality of text image elements, a plurality of element groups, wherein each element group comprises one or more text image elements and is separated, within the digitized image object, from other element groups of the plurality of element groups by at least one delimiter; generate a numerical representation of the plurality of element groups, comprising a respective plurality of numerical values, wherein a particular numerical value corresponding to a particular element group is determined based at least in part on a combined size of text image elements of the particular element group; and store, as a fingerprint representing text contents of the digitized image object, at least a subset of the numerical representation. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification