Segmentation of a word bitmap into individual characters or glyphs during an OCR process
First Claim
1. An apparatus that generates characters or glyphs from a bitmap of text, comprising:
- a processor;
memory, when operatively coupled to the processor, implementing;
an input component for receiving a bitmap of an image comprising at least one textual line that is identified by a base-line and a mean-line;
a character chopper component that includes a candidate chop line generator component for generating a plurality of chop lines that each separate a pair of adjacent characters or glyphs in the textual line, wherein the candidate chop line generator component is configured to produce a candidate chop line through each pixel in at least one row extending along and within the textual line, said character chopper component further including a chop line selection component for selecting a subset of the candidate chop lines that correspond to the plurality of chop lines that each separate a pair of adjacent characters or glyphs in the textual line; and
an output component that applies the chop line to the textual line to produce the characters or glyphs,wherein the candidate chop line generator component generates candidate chop lines that each maximize a fitness function that increases as a total path lightness of the respective candidate chop lines increases and decreases as an intersection number increases, wherein the intersection number denotes a number of white-to-black and black to white transitions that the respective candidate chop line crosses.
2 Assignments
0 Petitions
Accused Products
Abstract
An image processing apparatus is provided that includes a character chopper component that segments words into individual characters in a bitmap of a textual image undergoing an OCR process. The Character chopper component is configured to produce a set of (possibly curved) chop-lines which divide a bitmap of any given word into its individual character or glyph candidates. Cases where an input bitmap contains two separate words are handled by marking a place where those words should be split. The character segmentation algorithm computes the set of vertically oriented, curved chop-lines by considering glyph and background colors in a given word bitmap. The set is filtered afterwards using various heuristics, in order to preserve those lines that indeed do separate a word'"'"'s glyphs and minimize the number of those that do not.
18 Citations
18 Claims
-
1. An apparatus that generates characters or glyphs from a bitmap of text, comprising:
-
a processor; memory, when operatively coupled to the processor, implementing; an input component for receiving a bitmap of an image comprising at least one textual line that is identified by a base-line and a mean-line; a character chopper component that includes a candidate chop line generator component for generating a plurality of chop lines that each separate a pair of adjacent characters or glyphs in the textual line, wherein the candidate chop line generator component is configured to produce a candidate chop line through each pixel in at least one row extending along and within the textual line, said character chopper component further including a chop line selection component for selecting a subset of the candidate chop lines that correspond to the plurality of chop lines that each separate a pair of adjacent characters or glyphs in the textual line; and an output component that applies the chop line to the textual line to produce the characters or glyphs, wherein the candidate chop line generator component generates candidate chop lines that each maximize a fitness function that increases as a total path lightness of the respective candidate chop lines increases and decreases as an intersection number increases, wherein the intersection number denotes a number of white-to-black and black to white transitions that the respective candidate chop line crosses. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for segmenting words in a bitmap of an image into characters or glyphs, comprising:
-
receiving a bitmap of an image comprising at least one textual line that is identified by a base-line and a mean-line; generating a plurality of chop lines that each separate a pair of adjacent characters or glyphs in the textual line by producing a candidate chop line through each pixel in at least one row extending along and within the textual line; and selecting a subset of the candidate chop lines that correspond to the plurality of chop lines that each separate a pair of adjacent characters or glyphs in the textual line; wherein generating the plurality of chop lines includes generating the candidate chop lines so that they each maximize a fitness function that increases as a total path lightness of the respective candidate chop lines increases and decreases as an intersection number increases, wherein the intersection number denotes a number of white-to-black and black to white transitions that the respective candidate chop line crosses. - View Dependent Claims (13, 14, 15)
-
-
16. A method for segmenting words in a bitmap of an image into characters or glyphs, comprising:
-
receiving a bitmap of an image comprising at least one textual line that is identified by a base-line and a mean-line; generating a plurality of candidate chop lines that each maximize a fitness function that increases as a total path lightness of the respective candidate chop lines increase and decreases as an intersection number increases, wherein the intersection number denotes a number of white-to-black and black to white transitions that the respective candidate chop line crosses; selecting at least one chop line from among the candidate chop lines, wherein the chop line separates a pair of adjacent characters or glyphs in the textual line; applying the chop line to the textual line to produce the characters or glyphs from the received bitmap of the image. - View Dependent Claims (17, 18)
-
Specification