OCR-GUIDED TEXT TOKENIZATION OF DIGITAL IMAGES
First Claim
Patent Images
1. An image processing method comprising:
- scanning an input image via an image input device;
compressing the scanned image using an image compression tool by performing OCR (Optical Character Recognition) on each symbol in the scanned image to generate OCR results and then performing tokenization on the scanned image using the OCR results; and
storing the compressed image in a storage device or printing the compressed image via an image output device after it has been decoded.
7 Assignments
0 Petitions
Accused Products
Abstract
An image processing method in which OCR is used to guide the text tokenization. More particularly, OCR is first performed on each symbol in the scanned image. For example, a symbol may be a number, letter, or other character. During the tokenization process, the OCR results are used to select appropriate matching criteria for each symbol. The symbols that are recognized as different characters are not allowed to be clustered into the same group. The symbols with the same OCR results are clustered according to the recognition confidence levels.
-
Citations
18 Claims
-
1. An image processing method comprising:
-
scanning an input image via an image input device; compressing the scanned image using an image compression tool by performing OCR (Optical Character Recognition) on each symbol in the scanned image to generate OCR results and then performing tokenization on the scanned image using the OCR results; and storing the compressed image in a storage device or printing the compressed image via an image output device after it has been decoded. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An image processing system comprising:
-
an image input device for scanning an input image; an image processing device connected to the image input device and operative to compress the scanned image using an image compression tool by performing OCR (Optical Character Recognition) on the scanned image to generate OCR results and then performing tokenization on the scanned image using the OCR results; a storage device connected to the image processing device and operative to store the compressed image; and an image output device connected to the image processing device and operative to print the compressed image after it has been decoded. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer program product comprising:
-
a computer-usable data carrier storing instructions that, when executed by a computer, cause the computer to perform a method comprising; scanning an input image via an image input device; compressing the scanned image using an image compression tool by performing OCR (Optical Character Recognition) on the scanned image to generate OCR results and then performing tokenization on the scanned image using the OCR results; and storing the compressed image in a storage device or printing the compressed image via an image output device after it has been decoded. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification