High resolution replication of document based on shape clustering
First Claim
Patent Images
1. A method, comprising:
- processing an image of a document to produce a collection of non-overlapping sub-regions of the image, each sub-region being at a first resolution;
generating multiple clusters of visually similar clip sub-regions, each of the sub-regions in the collection being included in one of the clusters;
generating a representative cluster image for each of the multiple clusters from the sub-regions in the respective cluster at a second resolution higher than the first resolution; and
producing a replica image of the document by replacing sub-regions in the image with the representative cluster images for the clusters in which the respective sub-regions are included, wherein the method is performed by one or more computer processors.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for shape clustering and applications in processing various documents, including an output of an optical character recognition (OCR) process.
63 Citations
29 Claims
-
1. A method, comprising:
-
processing an image of a document to produce a collection of non-overlapping sub-regions of the image, each sub-region being at a first resolution; generating multiple clusters of visually similar clip sub-regions, each of the sub-regions in the collection being included in one of the clusters; generating a representative cluster image for each of the multiple clusters from the sub-regions in the respective cluster at a second resolution higher than the first resolution; and producing a replica image of the document by replacing sub-regions in the image with the representative cluster images for the clusters in which the respective sub-regions are included, wherein the method is performed by one or more computer processors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer program product, encoded on a computer-readable medium, operable to cause data processing apparatus to perform operations comprising:
-
processing an image of a document to produce a collection of non-overlapping sub-regions of the image, each sub-region being at a first resolution; generating multiple clusters of visually similar clip sub-regions, each of the sub-regions in the collection being included in one of the clusters; generating a representative cluster image for each of the multiple clusters from the sub-regions in the respective cluster at a second resolution higher than the first resolution; and producing a replica image of the document by replacing sub-regions in the image with the representative cluster images for the clusters in which the respective sub-regions are included.
-
-
17. A method, comprising:
-
applying an optical character recognition (OCR) process to an original image of a document to produce clip images at different locations of the original image, each clip image being at a first resolution; classifying the clip images into a plurality of clusters of clip images, each cluster including clip images that are assigned the same one or more characters codes by the OCR process and are identical or similar in size; transforming each clip image in each cluster into a transformed clip image at a second resolution higher than the first resolution; averaging transformed clip images in each cluster to generate a cluster image; and using cluster images of the plurality of clusters to replace corresponding clip images initially produced by the OCR process to generate a replica of the original image of the document at the second resolution. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
-
24. A computer program product, encoded on a computer-readable medium, operable to cause data processing apparatus to perform operations comprising:
-
applying an optical character recognition (OCR) process to an original image of a document to produce clip images at different locations of the original image, each clip image being at a first resolution; classifying the clip images into a plurality of clusters of clip images, each cluster including clip images that are assigned the same one or more characters codes by the OCR process and are identical or similar in size; transforming each clip image in each cluster into a transformed clip image at a second resolution higher than the first resolution; averaging transformed clip images in each cluster to generate a cluster image; and using cluster images of the plurality of clusters to replace corresponding clip images initially produced by the OCR process to generate a replica of the original image of the document at the second resolution.
-
-
25. A system, comprising:
-
an optical character recognition (OCR) engine operable to process an original image of a document to produce an OCR output which includes clip images at different locations of the original image, each clip image being at a first resolution; and a post-OCR engine in communication with the OCR engine to receive the OCR output, wherein the post-OCR engine is operable to; classify the clip images into a plurality of clusters of clip images, each cluster including clip images that are assigned the same one or more characters codes by the OCR engine and are identical or similar in size; transform each clip image in each cluster into a transformed clip image at a second resolution higher than the first resolution; average transformed clip images in each cluster to generate a cluster image; and use cluster images of the plurality of clusters to replace corresponding clip images initially produced by the OCR engine to generate a replica of the original image of the document at the second resolution. - View Dependent Claims (26, 27, 28, 29)
-
Specification