SEGMENTATION OF A WORD BITMAP INTO INDIVIDUAL CHARACTERS OR GLYPHS DURING AN OCR PROCESS
First Claim
1. An apparatus that generates characters or glyphs from a bitmap of text, comprising:
- an input component for receiving a bitmap of an image comprising at least one textual line that is identified by a base-line and a mean-line;
a character chopper component that includes a candidate chop line generator component for generating a plurality of chop lines that each separate a pair of adjacent characters or glyphs in the textual line, wherein the candidate chop line generator component is configured to produce a candidate chop line through each pixel in at least one row extending along and within the textual line, said character chopper component further including a chop line selection component for selecting a subset of the candidate chop lines that correspond to the plurality of chop lines that each separate a pair of adjacent characters or glyphs in the textual line; and
an output component that applies the chop line to the textual line to produce the characters or glyphs.
2 Assignments
0 Petitions
Accused Products
Abstract
An image processing apparatus is provided that includes a character chopper component that segments words into individual characters in a bitmap of a textual image undergoing an OCR process. The Character chopper component is configured to produce a set of (possibly curved) chop-lines which divide a bitmap of any given word into its individual character or glyph candidates. Cases where an input bitmap contains two separate words are handled by marking a place where those words should be split. The character segmentation algorithm computes the set of vertically oriented, curved chop-lines by considering glyph and background colors in a given word bitmap. The set is filtered afterwards using various heuristics, in order to preserve those lines that indeed do separate a word'"'"'s glyphs and minimize the number of those that do not.
-
Citations
20 Claims
-
1. An apparatus that generates characters or glyphs from a bitmap of text, comprising:
-
an input component for receiving a bitmap of an image comprising at least one textual line that is identified by a base-line and a mean-line; a character chopper component that includes a candidate chop line generator component for generating a plurality of chop lines that each separate a pair of adjacent characters or glyphs in the textual line, wherein the candidate chop line generator component is configured to produce a candidate chop line through each pixel in at least one row extending along and within the textual line, said character chopper component further including a chop line selection component for selecting a subset of the candidate chop lines that correspond to the plurality of chop lines that each separate a pair of adjacent characters or glyphs in the textual line; and an output component that applies the chop line to the textual line to produce the characters or glyphs. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for segmenting words in a bitmap of an image into characters or glyphs, comprising:
-
receiving a bitmap of an image comprising at least one textual line that is identified by a base-line and a mean-line; generating a plurality of chop lines that each separate a pair of adjacent characters or glyphs in the textual line by producing a candidate chop line through each pixel in at least one row extending along and within the textual line; and selecting a subset of the candidate chop lines that correspond to the plurality of chop lines that each separate a pair of adjacent characters or glyphs in the textual line. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A medium comprising instructions executable by a computing system, wherein the instructions configure the computing system to perform a method for segmenting words in a bitmap of an image into characters or glyphs, comprising:
-
receiving a bitmap of an image comprising at least one textual line that is identified by a base-line and a mean-line; generating a plurality of candidate chop lines that each maximize a fitness function that increases as a total path lightness of the respective candidate chop lines increase and decreases as an intersection number increases, wherein the intersection number denotes a number of white-to-black and black to white transitions that the respective candidate chop line crosses; and selecting at least one chop line from among the candidate chop lines, wherein the chop line separates a pair of adjacent characters or glyphs in the textual line. - View Dependent Claims (19, 20)
-
Specification