Word spotting in bitmap images using text line bounding boxes and hidden Markov models
First Claim
1. A processor-based method of determining whether a keyword made up of characters is present in a bitmap input image containing words in lines of text, the words and lines of text being considered to extend horizontally, the method comprising the steps of:
- providing a set of previously-trained single-character hidden Markov models (HMMs), each single-character HMM having a number of possible contexts, depending on whether the character has an ascender or a descender;
concatenating those single-character HMMs that correspond to the characters in the keyword so as to create a keyword HMM , the context of a given single-character HMM used to create the keyword HMM being determined on the basis of whether the keyword contains characters having ascenders or a descenders;
constructing an HMM network that includes a first path passing through the keyword HMM and a second path that does not pass through the keyword HMM;
locating a portion of the input image potentially containing a line of text;
providing an array of pixel values, referred to as a potential text line, representing the portion of the input image;
horizontally sampling the potential text line to provide a plurality of segments wherein each segment extends the entire height of the potential text line and the sampling to provide segments is performed in a manner that is independent of the values of the pixels in the potential textline;
for each segment, generating at least one feature that depends on the values of the pixels in the segment, thereby providing a set of features based on the potential text line, the set of features providing shape information regarding words in the line of text potentially contained in the portion of the input image;
applying the set of features to the HMM network;
finding a path through the network that maximizes the probability of the set of features as applied to the network; and
determining whether the path that maximizes the probability passes through the keyword HMM so as to provide an indication whether the portion of the input image contains the keyword.
3 Assignments
0 Petitions
Accused Products
Abstract
Font-independent spotting of user-defined keywords in a scanned image. Word identification is based on features of the entire word without the need for segmentation or OCR, and without the need to recognize non-keywords. Font-independent character models are created using hidden Markov models (HMMS) and arbitrary keyword models are built from the character HMM components. Word or text line bounding boxes are extracted from the image, a set of features based on the word shape, (and preferably also the word internal structure) within each bounding box is extracted, this set of features is applied to a network that includes one or more keyword HMMs, and a determination is made. The identification of word bounding boxes for potential keywords includes the steps of reducing the image (say by 2×) and subjecting the reduced image to vertical and horizontal morphological closing operations. The bounding boxes of connected components in the resulting image are then used to hypothesize word or text line bounding boxes, and the original bitmaps within the boxes are used to hypothesize words. In a particular embodiment, a range of structuring elements is used for the closing operations to accommodate the variation of inter- and intra-character spacing with font and font size.
-
Citations
46 Claims
-
1. A processor-based method of determining whether a keyword made up of characters is present in a bitmap input image containing words in lines of text, the words and lines of text being considered to extend horizontally, the method comprising the steps of:
-
providing a set of previously-trained single-character hidden Markov models (HMMs), each single-character HMM having a number of possible contexts, depending on whether the character has an ascender or a descender; concatenating those single-character HMMs that correspond to the characters in the keyword so as to create a keyword HMM , the context of a given single-character HMM used to create the keyword HMM being determined on the basis of whether the keyword contains characters having ascenders or a descenders; constructing an HMM network that includes a first path passing through the keyword HMM and a second path that does not pass through the keyword HMM; locating a portion of the input image potentially containing a line of text; providing an array of pixel values, referred to as a potential text line, representing the portion of the input image; horizontally sampling the potential text line to provide a plurality of segments wherein each segment extends the entire height of the potential text line and the sampling to provide segments is performed in a manner that is independent of the values of the pixels in the potential textline; for each segment, generating at least one feature that depends on the values of the pixels in the segment, thereby providing a set of features based on the potential text line, the set of features providing shape information regarding words in the line of text potentially contained in the portion of the input image; applying the set of features to the HMM network; finding a path through the network that maximizes the probability of the set of features as applied to the network; and determining whether the path that maximizes the probability passes through the keyword HMM so as to provide an indication whether the portion of the input image contains the keyword. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 22, 23)
-
-
16. A processor-based method of determining whether a keyword made up of characters is present in a bitmap input image containing words in lines of text, the words and lines of text being considered to extend horizontally, the method comprising the steps of:
-
providing a set of previously trained single-character HMMs; concatenating those single-character HMMs that correspond to the characters in the keyword so as to create a keyword HMM, providing a non-keyword HMM; constructing an HMM network that includes a first path passing through the keyword HMM and a second path passing through the non-keyword HMM but not passing through the keyword HMM; locating a portion of the input image potentially containing a line of text; providing an array of pixel values, referred to as a potential text line, representing the portion of the input image; generating a set of features based on the potential text line, the set of features providing shape information regarding the words in the line of text potentially contained in the portion of the input image; the set of features being generated at a plurality of uniformly spaced horizontal locations, thereby avoiding segmentation of the potential text line in a manner that depends on values of the pixels in the potential text line; applying the set of features to the HMM network; finding a path through the network that maximizes the probability of the set of features as applied to the network; and determining whether the path that maximizes the probability passes through the keyword HMM so as to provide an indication whether the portion of the input image contains the keyword. - View Dependent Claims (17, 18, 19, 20, 21, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A processor-based method of determining whether a keyword made up of characters is present in a bitmap input image containing words in lines of text, the words and lines of text being considered to extend horizontally, the method comprising the steps of:
-
providing a set of previously trained single-character hidden Markov models (HMMS) wherein each character on which one of the set of single-character HMMs is based has a number of distinct portions, each character has a shape that is characterized by feature vectors at corresponding horizontal locations along the character, a given single-character HMM for a given character is characterized by a number of states, each state of which corresponds to one of the number of distinct portions of the given character, and each state is characterized by a probability distribution of the feature vectors that characterize the corresponding distinct portion of the given character; concatenating those single-character HMMs that correspond to the characters in the keyword so as to create a keyword HMM; providing a non-keyword HMM; constructing a network that includes a first path passing through the keyword HMM and a second path passing through the non-keyword HMM but not passing through the keyword HMM; locating a portion of the input image potentially containing a line of text; providing an array of pixel values, referred to as a potential text line, representing the portion of the input image, the potential text line having a plurality of vertically extending columns of pixels at respective ones of a plurality of uniformly spaced horizontal locations; generating a plurality of feature vectors determined at respective ones of the plurality of horizontal locations in the potential text line, a given feature vector for a given horizontal location being specified by pixel values in the column at the given horizontal location; the plurality of feature vectors together representing word shape while being generated by uniform segmentation of the potential text line, whereby segmentation of a type that depends on values of the pixels in the potential text line is avoided; applying the plurality of feature vectors to the HMM network; finding a path through the network that maximizes the probability of the plurality of feature vectors as applied to the network; and determining whether the path that maximizes the probability passes through the keyboard HMM, so as to provide an indication whether the potential text line is the keyword. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
-
Specification