Local connectivity feature transform of binary images containing text characters for optical character/word recognition
First Claim
1. A method for processing a binary document image containing text characters, the method comprising:
- (a) obtaining the binary document image, the document image having a plurality of pixels, each pixel having either a first pixel value representing content of the document or a second pixel value representing background;
(b) assigning the second pixel value to all pixels located on a boundary of the document image;
(c) generating a transformed document image, the transformed document image being a grayscale image having a same size as the binary document image, including;
(c1) for each pixel (i,j) of the document image that has the second pixel value, where i and j denote position indices of the document image respectively, assigning a fixed transform score to the pixel,(c2) for each pixel (i,j) of the document image that has the first pixel value, computing a transform score using
T(i,j)=Σ
m=−
1+1Σ
n=−
1+1W(m, n)*P(i+m,j+n)where T(i,j) is the transform score of the pixel (i,j), m and n are integers and m, n ∈
{−
1, 0, +1}, W(m,n) is a 3×
3 weight matrix, and P(i+m,j+n) is the pixel value of pixel (i+m,j+n),wherein a center element of the 3×
3 weight matrix W(m,n) has a value of zero, and each one of eight non-center elements of the 3×
3 weight matrix W(m,n) has a value which is a different one of eight numbers 2q, q=0, 1, 2, . . . 7; and
wherein the transform scores of all pixels of the document image form the transformed image; and
(d) processing the transformed image using a bi-directional Long Short Term Memory (LSTM) neural network for character/word recognition to recognize characters or words in the transformed image.
1 Assignment
0 Petitions
Accused Products
Abstract
A local connectivity feature transform (LCFT) is applied to binary document images containing text characters, to generate transformed document images which are then input into a bi-directional Long Short Term Memory (LSTM) neural network to perform character/word recognition. The LCFT transformed image is a gray scale image where the pixel values encode local pixel connectivity information of corresponding pixels in the original binary image. The transform is one that provides a unique transform score for every possible shape represented as a 3×3 block. In one example, the transform is computed using a 3×3 weight matrix that combines bit coding with a zigzag pattern to assign weights to each element of the 3×3 block, and by summing up the weights for the non-zero elements of the 3×3 block shape.
8 Citations
15 Claims
-
1. A method for processing a binary document image containing text characters, the method comprising:
-
(a) obtaining the binary document image, the document image having a plurality of pixels, each pixel having either a first pixel value representing content of the document or a second pixel value representing background; (b) assigning the second pixel value to all pixels located on a boundary of the document image; (c) generating a transformed document image, the transformed document image being a grayscale image having a same size as the binary document image, including; (c1) for each pixel (i,j) of the document image that has the second pixel value, where i and j denote position indices of the document image respectively, assigning a fixed transform score to the pixel, (c2) for each pixel (i,j) of the document image that has the first pixel value, computing a transform score using
T(i,j)=Σ
m=−
1+1Σ
n=−
1+1W(m, n)*P(i+m,j+n)where T(i,j) is the transform score of the pixel (i,j), m and n are integers and m, n ∈
{−
1, 0, +1}, W(m,n) is a 3×
3 weight matrix, and P(i+m,j+n) is the pixel value of pixel (i+m,j+n),wherein a center element of the 3×
3 weight matrix W(m,n) has a value of zero, and each one of eight non-center elements of the 3×
3 weight matrix W(m,n) has a value which is a different one of eight numbers 2q, q=0, 1, 2, . . . 7; andwherein the transform scores of all pixels of the document image form the transformed image; and (d) processing the transformed image using a bi-directional Long Short Term Memory (LSTM) neural network for character/word recognition to recognize characters or words in the transformed image. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for processing a binary document image containing text characters, the method comprising:
-
(a) obtaining the binary document image, the document image having a plurality of pixels, each pixel having either a first pixel value representing content of the document or a second pixel value representing background; (b) generating a transformed document image by transforming the binary document image into the transformed document image, the transformed document image being a grayscale image having a same size as the binary document image, each pixel of the transformed image representing a transform score that encodes local pixel connectivity information of the corresponding pixel in the binary document image; and (c) processing the transformed image using a bi-directional Long Short Term Memory (LSTM) neural network for character/word recognition. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
-
Specification