×

Word spotting in bitmap images using text line bounding boxes and hidden Markov models

  • US 5,745,600 A
  • Filed: 11/09/1994
  • Issued: 04/28/1998
  • Est. Priority Date: 12/17/1992
  • Status: Expired due to Term
First Claim
Patent Images

1. A processor-based method of determining whether a keyword made up of characters is present in a bitmap input image containing words in lines of text, the words and lines of text being considered to extend horizontally, the method comprising the steps of:

  • providing a set of previously-trained single-character hidden Markov models (HMMs), each single-character HMM having a number of possible contexts, depending on whether the character has an ascender or a descender;

    concatenating those single-character HMMs that correspond to the characters in the keyword so as to create a keyword HMM , the context of a given single-character HMM used to create the keyword HMM being determined on the basis of whether the keyword contains characters having ascenders or a descenders;

    constructing an HMM network that includes a first path passing through the keyword HMM and a second path that does not pass through the keyword HMM;

    locating a portion of the input image potentially containing a line of text;

    providing an array of pixel values, referred to as a potential text line, representing the portion of the input image;

    horizontally sampling the potential text line to provide a plurality of segments wherein each segment extends the entire height of the potential text line and the sampling to provide segments is performed in a manner that is independent of the values of the pixels in the potential textline;

    for each segment, generating at least one feature that depends on the values of the pixels in the segment, thereby providing a set of features based on the potential text line, the set of features providing shape information regarding words in the line of text potentially contained in the portion of the input image;

    applying the set of features to the HMM network;

    finding a path through the network that maximizes the probability of the set of features as applied to the network; and

    determining whether the path that maximizes the probability passes through the keyword HMM so as to provide an indication whether the portion of the input image contains the keyword.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×