Word spotting in bitmap images using context-sensitive character models without baselines
First Claim
1. For use in a processor-based method of determining whether a keyword is present in a bitmap input image wherein the input image is subjected to an operation that determines bounding boxes for portions of the input image that potentially contain text and an operation that generates a sequence of features for each such portion of the input image, the text being considered to extend horizontally, a method for modeling the appearance of keywords in the input image, the method comprising the steps of:
- providing a set of previously-trained single-character hidden Markov models (HMMs);
concatenating those single-character HMMs that correspond to the characters in the keyword so as to define a keyword HMM; and
constructing an HMM network that includes the keyword HMM;
said providing step being characterized in that;
each character has a shape that is characterized at each of a plurality of horizontal locations along the character by at least one parameter representing a vertical slice of the character at the horizontal location,each character has a number of distinct portions, the number being greater than 1 for at least some characters,each distinct portion spans a respective contiguous subset of the plurality of horizontal locations,a given single-character HMM for a given character is characterized by a number of states, each state of which corresponds to a respective one of the number of distinct portions of the given character,each state is characterized by a statistical distribution of the at least one parameter for the corresponding distinct portion of the given character,each single-character HMM has a number of possible contexts depending on whether the character has an ascender or descender,for a given single-character HMM, a given state for a first context differs from the given state for a second context in a manner that reflects differences in at least one of (a) the size of the given character, and (b) the vertical position of the given character, such differences being between a situation when the given character is registered to a bounding box of a portion of the input image in the first context and a situation when the given character is registered to the bounding box of a portion of the input image in the second context, andthe single-character HMMs that are concatenated to form the keyword HMM have the same context.
3 Assignments
0 Petitions
Accused Products
Abstract
Font-independent spotting of user-defined keywords in a scanned image. Word identification is based on features of the entire word without the need for segmentation or OCR, and without the need to recognize non-keywords. Font-independent character models are created using hidden Markov models (HMMs) and arbitrary keyword models are built from the character HMM components. Word or text line bounding boxes are extracted from the image, a set of features based on the word shape, (and preferably also the word internal structure) within each bounding box is extracted, this set of features is applied to a network that includes one or more keyword HMMs, and a determination is made. The identification of word bounding boxes for potential keywords includes the steps of reducing the image (say by 2x) and subjecting the reduced image to vertical and horizontal morphological closing operations. The bounding boxes of connected components in the resulting image are then used to hypothesize word or text line bounding boxes, and the original bitmaps within the boxes are used to hypothesize words. In a particular embodiment, a range of structuring elements is used for the closing operations to accommodate the variation of inter- and intra-character spacing with font and font size.
45 Citations
18 Claims
-
1. For use in a processor-based method of determining whether a keyword is present in a bitmap input image wherein the input image is subjected to an operation that determines bounding boxes for portions of the input image that potentially contain text and an operation that generates a sequence of features for each such portion of the input image, the text being considered to extend horizontally, a method for modeling the appearance of keywords in the input image, the method comprising the steps of:
-
providing a set of previously-trained single-character hidden Markov models (HMMs); concatenating those single-character HMMs that correspond to the characters in the keyword so as to define a keyword HMM; and constructing an HMM network that includes the keyword HMM; said providing step being characterized in that; each character has a shape that is characterized at each of a plurality of horizontal locations along the character by at least one parameter representing a vertical slice of the character at the horizontal location, each character has a number of distinct portions, the number being greater than 1 for at least some characters, each distinct portion spans a respective contiguous subset of the plurality of horizontal locations, a given single-character HMM for a given character is characterized by a number of states, each state of which corresponds to a respective one of the number of distinct portions of the given character, each state is characterized by a statistical distribution of the at least one parameter for the corresponding distinct portion of the given character, each single-character HMM has a number of possible contexts depending on whether the character has an ascender or descender, for a given single-character HMM, a given state for a first context differs from the given state for a second context in a manner that reflects differences in at least one of (a) the size of the given character, and (b) the vertical position of the given character, such differences being between a situation when the given character is registered to a bounding box of a portion of the input image in the first context and a situation when the given character is registered to the bounding box of a portion of the input image in the second context, and the single-character HMMs that are concatenated to form the keyword HMM have the same context. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. For use in a keyword-spotting network that includes at least one keyword hidden Markov model (HMM), a method of providing a context-insensitive, character based, non-keyword HMM comprising:
-
providing first, second, third, and fourth sets of character HMMs, each set modeling a set of characters in a particular context; connecting the character HMMs from the first, second, third, and fourth sets of character HMMs in parallel between null states with a return loop whereby the optimal path for a non-keyword through the HMM is not constrained by the context of the characters in the non-keyword.
-
-
12. For use in a keyword-spotting network that includes at least one keyword hidden Markov model (HMM), a method of providing a context-sensitive, character-based, non-keyword HMM comprising:
-
providing first, second, third, and fourth sets of character HMMs, each set modeling a set of characters in a particular context; constructing respective single-context character set HMMs, each having a respective set of character HMMs for the respective context connected in parallel between null states with a return loop; and connecting the single-context character set HMMs in parallel between null states with no return loops whereby the optimal path for a non-keyword through the HMM is constrained by the context of the characters in the non-keyword.
-
-
13. For use in a keyword-spotting network that includes at least one keyword hidden Markov model (HMM), a method of providing a context-insensitive, image-slice based, non-keyword HMM comprising:
-
providing a set of image-slice states wherein the image-slice states model vertical slices of a portion of the input image without regard to the context of characters in that portion of the input image; and connecting the set of image-slice states in parallel between null states with a return loop whereby the optimal path for a non-keyword through the HMM is not constrained by the context of the characters in the non-keyword.
-
-
14. For use in a keyword-spotting network that includes at least one keyword hidden Markov model (HMM), a method of providing a context-sensitive, image-slice based, non-keyword HMM comprising:
-
providing first, second, third, and fourth sets of image-slice states corresponding to first, second, third, and fourth contexts, wherein the image-slice states for a given set model vertical slices of character images in the particular context for that set; constructing respective single-context image-slice set HMMs, each having the image-slice states for the respective context connected in parallel between null states with a return loop; and connecting the single-context image-slice set HMMs in parallel between null states with no return loops whereby the optimal path for a non-keyword through the HMM is constrained by the context of the characters in the non-keyword.
-
-
15. For use in a processor-based method of determining whether a keyword is present in a bitmap input image wherein the input image is subjected to an operation that determines bounding boxes for portions of the input image that potentially contain text and an operation that generates a sequence of features for each such portion of the input image, the text being considered to extend horizontally, a method for modeling the appearance of keywords in the input image, the method comprising the steps of:
-
providing a set of previously-trained single-character hidden Markov models (HMMs); concatenating those single-character HMMs that correspond to the characters in the keyword so as to define a keyword HMM; and constructing an HMM network that includes the keyword HMM; said providing step being characterized in that; each character has a shape that is characterized by a number of feature vectors at a corresponding number of horizontal locations along the character, a given single-character HMM for a given character is characterized by a number of states, each state of which corresponds to a respective one of the number of distinct portions of the given character, each character has a number of distinct portions, the number being greater than 1 for at least some characters, each distinct portion spans a respective contiguous subset of the plurality of horizontal locations, each state is characterized by a statistical distribution of the feature vectors that characterize the corresponding distinct portion of the given character, each single-character HMM has a number of possible contexts depending on whether the character has an ascender or descender, the single-character HMMS that are concatenated to form the keyword HMM have the same context, and the feature vector for a given horizontal location in a given character includes a representation of an upper and lower character boundary referenced to a bounding box for the keyword. - View Dependent Claims (16, 17, 18)
-
Specification