Word spotting in bitmap images using text line bounding boxes and hidden Markov models

US 5,745,600 A
Filed: 11/09/1994
Issued: 04/28/1998
Est. Priority Date: 12/17/1992
Status: Expired due to Term

First Claim

Patent Images

1. A processor-based method of determining whether a keyword made up of characters is present in a bitmap input image containing words in lines of text, the words and lines of text being considered to extend horizontally, the method comprising the steps of:

providing a set of previously-trained single-character hidden Markov models (HMMs), each single-character HMM having a number of possible contexts, depending on whether the character has an ascender or a descender;

concatenating those single-character HMMs that correspond to the characters in the keyword so as to create a keyword HMM , the context of a given single-character HMM used to create the keyword HMM being determined on the basis of whether the keyword contains characters having ascenders or a descenders;

constructing an HMM network that includes a first path passing through the keyword HMM and a second path that does not pass through the keyword HMM;

locating a portion of the input image potentially containing a line of text;

providing an array of pixel values, referred to as a potential text line, representing the portion of the input image;

horizontally sampling the potential text line to provide a plurality of segments wherein each segment extends the entire height of the potential text line and the sampling to provide segments is performed in a manner that is independent of the values of the pixels in the potential textline;

for each segment, generating at least one feature that depends on the values of the pixels in the segment, thereby providing a set of features based on the potential text line, the set of features providing shape information regarding words in the line of text potentially contained in the portion of the input image;

applying the set of features to the HMM network;

finding a path through the network that maximizes the probability of the set of features as applied to the network; and

determining whether the path that maximizes the probability passes through the keyword HMM so as to provide an indication whether the portion of the input image contains the keyword.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Font-independent spotting of user-defined keywords in a scanned image. Word identification is based on features of the entire word without the need for segmentation or OCR, and without the need to recognize non-keywords. Font-independent character models are created using hidden Markov models (HMMS) and arbitrary keyword models are built from the character HMM components. Word or text line bounding boxes are extracted from the image, a set of features based on the word shape, (and preferably also the word internal structure) within each bounding box is extracted, this set of features is applied to a network that includes one or more keyword HMMs, and a determination is made. The identification of word bounding boxes for potential keywords includes the steps of reducing the image (say by 2×) and subjecting the reduced image to vertical and horizontal morphological closing operations. The bounding boxes of connected components in the resulting image are then used to hypothesize word or text line bounding boxes, and the original bitmaps within the boxes are used to hypothesize words. In a particular embodiment, a range of structuring elements is used for the closing operations to accommodate the variation of inter- and intra-character spacing with font and font size.

Citations

46 Claims

1. A processor-based method of determining whether a keyword made up of characters is present in a bitmap input image containing words in lines of text, the words and lines of text being considered to extend horizontally, the method comprising the steps of:
- providing a set of previously-trained single-character hidden Markov models (HMMs), each single-character HMM having a number of possible contexts, depending on whether the character has an ascender or a descender;
  
  concatenating those single-character HMMs that correspond to the characters in the keyword so as to create a keyword HMM , the context of a given single-character HMM used to create the keyword HMM being determined on the basis of whether the keyword contains characters having ascenders or a descenders;
  
  constructing an HMM network that includes a first path passing through the keyword HMM and a second path that does not pass through the keyword HMM;
  
  locating a portion of the input image potentially containing a line of text;
  
  providing an array of pixel values, referred to as a potential text line, representing the portion of the input image;
  
  horizontally sampling the potential text line to provide a plurality of segments wherein each segment extends the entire height of the potential text line and the sampling to provide segments is performed in a manner that is independent of the values of the pixels in the potential textline;
  
  for each segment, generating at least one feature that depends on the values of the pixels in the segment, thereby providing a set of features based on the potential text line, the set of features providing shape information regarding words in the line of text potentially contained in the portion of the input image;
  
  applying the set of features to the HMM network;
  
  finding a path through the network that maximizes the probability of the set of features as applied to the network; and
  
  determining whether the path that maximizes the probability passes through the keyword HMM so as to provide an indication whether the portion of the input image contains the keyword.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 22, 23)
- - 2. The method of claim 1 wherein:
    - each character on which one of the set of single-character HMMs is based has a number of distinct portions;
      
      each character has a shape that is characterized by a number of feature vectors, at a corresponding number of horizontal locations along the character;
      
      a given single-character HMM for a given character is characterized by a number of states, each state of which corresponds to one of the number of distinct portions of the given character; and
      
      each state is characterized by a Gaussian distribution with mean vector and covariance matrix of the feature vectors that characterize the corresponding distinct portion of the given character.
  - 3. The method of claim 1 wherein the step of locating includes performing at least one reduction operation on the input image.
  - 4. The method of claim 1 wherein the step of locating includes performing at least one morphological operation on a representation of the input image.
  - 5. The method of claim 1 wherein:
    - substantially all the segments have the same width;
      
      the set of features for the potential text line includes a plurality of feature vectors, each feature vector corresponding to a respective one of the segments; and
      
      a given feature vector for a given segment represents pixel values of the given segment.
  - 6. The method of claim 5 wherein a feature vector for a given segment of the potential text line includes a representation of the topmost pixel in the given segment having a value above a threshold, a representation of the bottommost pixel in the given segment having a value above the threshold, and a representation of pixel values between the topmost pixel having a value above the threshold and the bottommost pixel having a value above the threshold.
  - 7. The method of claim 6 wherein the representation of pixel values includes a set of autocorrelation values.
  - 8. The method of claim 6 wherein the representation of pixel values includes the number of pixel transitions through a threshold value.
  - 9. The method of claim 1 wherein the step of providing an array of pixel values includes proportionally scaling the height and width of the portion of the input image to provide a potential text line with a height having a predefined number of pixels.
  - 10. The method of claim 1 wherein the single-character HMMs were trained on a corpus containing bitmap images of text in a plurality of fonts.
  - 11. The method of claim 1 wherein:
    - the keyword HMM is referred to below as the first keyword HMM; and
      
      the method further comprises the stop of concatenating those single-character HMMs that correspond to the characters in the keyword to create a second keyword HMM, the context of a given single-character HMM used to create the second keyword HMM-being a context different from the context of the given single-character HMM used to create the first keyword HMM; and
      
      the HMM network-includes a third path passing through the second keyword HMM but not through the first keyword HMM.
  - 12. The method of claim 1 wherein:
    - the keyword includes only characters having neither ascenders nor descenders;
      
      the keyword HMM, referred to below as the first keyword HMM consists only of single-character HMMs having the context of neither ascenders nor descenders;
      
      the method further includes the steps of concatenating those single-character HMMs that correspond to the characters in the keyword to create second, third, and fourth keyword HMMs;
      
      the second keyword HMM consists only of single-character HMMs having the context of ascenders only;
      
      the third keyword HMM consists only of single-character HMMs having the context of descenders only;
      
      the fourth keyword HMM consists only of single-character HMMs having the context of both ascenders and descenders;
      
      the HMM network includes first, second, third, and fourth text line HMM components connected in parallel; and
      
      each of the first, second, third, and fourth text line HMM components includes a respective one of the first keyword HMM, the second keyword HMM, the third keyword HMM, and the fourth keyword HMM.
  - 13. The method of claim 1 wherein:
    - the keyword includes characters having ascenders but no characters having descenders;
      
      the keyword HMM referred to below as the first keyword HMM, consists only of single-character HMMs having the context of ascenders only;
      
      the method further includes the steps of concatenating those single-character HMMs that correspond to the characters in the keyword to create a second keyword HMM;
      
      the second keyword HMM consists only of single-character HMMs having the context of both ascenders and descenders;
      
      the HMM network includes first and second text line HMM components connected in parallel;
      
      the first text line HMM component includes the first keyword HMM; and
      
      the second text line HMM component includes the second keyword HMM.
  - 14. The method of claim 1 wherein the HMM network includes a the keyword HMM, a non-keyword HMM, and an interword space state.
  - 15. The method of claim 1 wherein:
    - the keyword is multi-word phrase; and
      
      the concatenating step includes concatenating single-character HMMs for the words in the multi-word phrase to provide a set of single-word HMMs and concatenating the single-word HMMs with at least one interword space state.
  - 22. The method of claim 10 wherein the representation of pixel values includes a set of autocorrelation values.
  - 23. The method of claim 10 wherein the representation of pixel values includes the number of pixel transitions through a threshold value.

16. A processor-based method of determining whether a keyword made up of characters is present in a bitmap input image containing words in lines of text, the words and lines of text being considered to extend horizontally, the method comprising the steps of:
- providing a set of previously trained single-character HMMs;
  
  concatenating those single-character HMMs that correspond to the characters in the keyword so as to create a keyword HMM, providing a non-keyword HMM;
  
  constructing an HMM network that includes a first path passing through the keyword HMM and a second path passing through the non-keyword HMM but not passing through the keyword HMM;
  
  locating a portion of the input image potentially containing a line of text;
  
  providing an array of pixel values, referred to as a potential text line, representing the portion of the input image;
  
  generating a set of features based on the potential text line, the set of features providing shape information regarding the words in the line of text potentially contained in the portion of the input image;
  
  the set of features being generated at a plurality of uniformly spaced horizontal locations, thereby avoiding segmentation of the potential text line in a manner that depends on values of the pixels in the potential text line;
  
  applying the set of features to the HMM network;
  
  finding a path through the network that maximizes the probability of the set of features as applied to the network; and
  
  determining whether the path that maximizes the probability passes through the keyword HMM so as to provide an indication whether the portion of the input image contains the keyword.
- View Dependent Claims (17, 18, 19, 20, 21, 24, 25, 26, 27, 28, 29, 30)
- - 17. The method of claim 16 wherein:
    - each character has a plurality of distinct portions;
      
      each character has a shape that is characterized by a number of feature vectors at a corresponding number of horizontal locations along the character;
      
      a given single-character HMM for a given character is characterized by a plurality of states, each state of which corresponds to one of the number of distinct portions of the given character; and
      
      each state is characterized by a Gaussian distribution with mean vector and covariance matrix of the feature vectors that characterize the corresponding distinct portion of the given character.
  - 18. The method of claim 16 wherein the step of locating includes performing at least one reduction operation on the input image.
  - 19. The method of claim 16 wherein the step of locating includes performing at least one morphological operation on a representation of the input image.
  - 20. The method of claim 16 wherein the set of features for the potential text line includes a plurality of multi-parameter feature vectors determined at respective ones of the plurality of uniformly spaced horizontal locations in the potential text line, a given feature vector for a given horizontal location representing pixel values at the given horizontal location.
  - 21. The method of claim 20 wherein a feature vector for a given horizontal location in the potential text line includes a representation of the topmost pixel at the given horizontal location having a value above a threshold, a representation of the bottommost pixel at the given horizontal location having a value above the threshold, and a representation of pixel values between the topmost pixel having a value above the threshold and the bottommost pixel having a value above the threshold.
  - 24. The method of claim 16 wherein the single-character HMMs were trained on a corpus containing bitmap images of text in a plurality of fonts.
  - 25. The method of claim 16 wherein:
    - each single-character HMM has a number of possible contexts, depending on whether the character has an ascender or a descender; and
      
      the context of a given single-character HMM used to create the keyword HMM is determined on the basis of whether the keyword contains characters having ascenders or a descenders.
  - 26. The method of claim 16 wherein the HMM network includes a the keyword HMM, a non-keyword HMM, and an interword space state.
  - 27. The method of claim 16 wherein:
    - the keyword is multi-word phrase; and
      
      the concatenating step includes concatenating single-character HMMs for the words in the multi-word phrase to provide a set of single-word HMMs and concatenating the single-word HMMs with at least one interword space state.
  - 28. The method of claim 16 wherein the step of locating includes constructing bounding boxes, at least some of which enclose respective lines of text in the input image, the bounding boxes being considered to extend horizontally.
  - 29. The method of claim 16 wherein the step of providing an array of pixel values includes proportionally scaling the height and width of the portion of the input image to provide a potential text line with a height having a predefined number of pixels.
  - 30. The method of claim 16 wherein the set of features further provides information regarding word internal structure.

31. A processor-based method of determining whether a keyword made up of characters is present in a bitmap input image containing words in lines of text, the words and lines of text being considered to extend horizontally, the method comprising the steps of:
- providing a set of previously trained single-character hidden Markov models (HMMS) whereineach character on which one of the set of single-character HMMs is based has a number of distinct portions,each character has a shape that is characterized by feature vectors at corresponding horizontal locations along the character,a given single-character HMM for a given character is characterized by a number of states, each state of which corresponds to one of the number of distinct portions of the given character, andeach state is characterized by a probability distribution of the feature vectors that characterize the corresponding distinct portion of the given character;
  
  concatenating those single-character HMMs that correspond to the characters in the keyword so as to create a keyword HMM;
  
  providing a non-keyword HMM;
  
  constructing a network that includes a first path passing through the keyword HMM and a second path passing through the non-keyword HMM but not passing through the keyword HMM;
  
  locating a portion of the input image potentially containing a line of text;
  
  providing an array of pixel values, referred to as a potential text line, representing the portion of the input image, the potential text line having a plurality of vertically extending columns of pixels at respective ones of a plurality of uniformly spaced horizontal locations;
  
  generating a plurality of feature vectors determined at respective ones of the plurality of horizontal locations in the potential text line, a given feature vector for a given horizontal location being specified by pixel values in the column at the given horizontal location;
  
  the plurality of feature vectors together representing word shape while being generated by uniform segmentation of the potential text line, whereby segmentation of a type that depends on values of the pixels in the potential text line is avoided;
  
  applying the plurality of feature vectors to the HMM network;
  
  finding a path through the network that maximizes the probability of the plurality of feature vectors as applied to the network; and
  
  determining whether the path that maximizes the probability passes through the keyboard HMM, so as to provide an indication whether the potential text line is the keyword.
- View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
- - 32. The method of claim 31 wherein the step of locating includes performing at least one reduction operation on the input image.
  - 33. The method of claim 31 wherein the step of locating includes performing at least one morphological operation on a representation of the input image.
  - 34. The method of claim 31 wherein the step of locating comprises:
    - performing at least one reduction operation on the input image; and
      
      performing at least one morphological operation on the input image, so reduced.
  - 35. The method of claim 31 wherein a feature vector for a given horizontal location in the potential text line includes a representation of the topmost pixel at the given horizontal location having a value above a threshold and a representation of the bottommost pixel at the given horizontal location having a value above the threshold.
  - 36. The method of claim 31 wherein the step of locating includes constructing bounding boxes, at least some of which enclose respective words in the input image, the bounding boxes being considered to extend horizontally.
  - 37. The method of claim 36 wherein at least some of the bounding boxes enclose respective portions of words in the input image.
  - 38. The method of claim 31 wherein a feature vector for a given horizontal location includes a representation of the topmost pixel at the given horizontal location having a value above a threshold, a representation of the bottommost pixel at the given horizontal location having a value above the threshold, and a representation of pixel values between the topmost pixel having a value above the threshold and the bottommost pixel having a value above the threshold.
  - 39. The method of claim 38 wherein the representation of pixel values includes a set of autocorrelation values.
  - 40. The method of claim 38 wherein the representation of pixel values includes the number of pixel transitions.
  - 41. The method of claim 31 wherein the step of providing an array of pixel values includes proportionally scaling the height and width of the portion of the input image to provide a potential text line with a height having a predefined number of pixels.
  - 42. The method of claim 31 wherein each feature vector contains a plurality of parameters.
  - 43. The method of claim 31 wherein the single-character HMMs were trained on a corpus containing bitmap images of text in a plurality of fonts.
  - 44. The method of claim 31 wherein:
    - each single-character HMM has a number of possible contexts, depending on whether the character has an ascender or a descender; and
      
      the context of a given single-character HMM used to create the keyword HMM is determined on the basis of whether the keyword contains characters having ascenders or a descenders.
  - 45. The method of claim 31 wherein the HMM network includes a the keyword HMM, a non-keyword HMM, and an interword space state.
  - 46. The method of claim 31 wherein:
    - the keyword is multi-word phrase; and
      
      the concatenating step includes concatenating single-character HMMs for the words in the multi-word phrase to provide a set of single-word HMMs and concatenating the single-word HMMs with at least one interword space state.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Chen, Francine R., Wilcox, Lynn D., Bloomberg, Dan S.
Primary Examiner(s)
Mancuso, Joseph
Assistant Examiner(s)
DEL ROSSO, GERARD DOMNICK

Application Number

US08/336,727
Time in Patent Office

1,266 Days
Field of Search

382/195, 382/196, 382/173, 382/174, 382/204, 382/254, 382/256-258, 382/155, 382/224, 382/226
US Class Current

382/218
CPC Class Codes

G06V 30/10 Character recognition

G06V 30/262 using context analysis, e.g...

Word spotting in bitmap images using text line bounding boxes and hidden Markov models

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

46 Claims

Specification

Solutions

Use Cases

Quick Links

Word spotting in bitmap images using text line bounding boxes and hidden Markov models

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

46 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links