Method and apparatus for formatting OCR text
First Claim
1. A method of determining a typeface for a plurality of words derived from a scanned image of text, the method comprising:
- a) providing a plurality of candidate typefaces;
b) calculating for each typeface and for each word a scaling factor to match a typeface rendering of the word to the size of the word in the scanned image; and
c) analyzing the calculated scaling factors to identify a typeface which matches a plurality of the words.
9 Assignments
0 Petitions
Accused Products
Abstract
Following scanning of a document image, and optical character recognition (OCR) processing, the outputted OCR text is processed to determine a text format (typeface and font size) to match the OCR text to the originally scanned image. The text format is identified by matching word sizes rather than individual character sizes. In particular, for each word and for each of a plurality of candidate typefaces, a scaling factor is calculated to match a typeface rendering of the word to the width of the word in the originally scanned image. After all of the scaling factors have been calculated, a cluster analysis is performed to identify close clusters of scaling factors for a typeface, indicative of a good typeface fit at a constant scaling factor (font size).
30 Citations
19 Claims
-
1. A method of determining a typeface for a plurality of words derived from a scanned image of text, the method comprising:
-
a) providing a plurality of candidate typefaces;
b) calculating for each typeface and for each word a scaling factor to match a typeface rendering of the word to the size of the word in the scanned image; and
c) analyzing the calculated scaling factors to identify a typeface which matches a plurality of the words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19)
-
-
12. A computer program which when executed on a computer implements a method of determining a typeface for a plurality of words derived from a scanned image of text, the method comprising:
-
a) providing a plurality of candidate typefaces;
b) calculating for each typeface and for each word a scaling factor to match a typeface rendering of the word to the size of the word in the scanned image; and
c) analyzing the calculated scaling factors to identify a typeface which matches a plurality of the words.
-
-
14. A system for determining a typeface for a plurality of words derived from a scanned image of text, the system comprising:
-
a) an image analyzer for identifying the original size of each word in the scanned image;
b) a typeface processing device for calculating for each of a plurality of candidate typefaces and for each word, a scaling factor to match a typeface rendering of the word to said original size of the word in the scanned image; and
c) a cluster processor device for identifying one or more clusters of scaling factors for a typeface indicative of a good fit of the typeface to a plurality of the words.
-
Specification