SYSTEM AND METHODS FOR ARABIC TEXT RECOGNITION BASED ON EFFECTIVE ARABIC TEXT FEATURE EXTRACTION
First Claim
1. A method for automatically recognizing Arabic text, comprising:
- acquiring a text image containing a line of Arabic characters;
digitizing the line of the Arabic characters to form a two-dimensional array of pixels each associated with a pixel value, wherein the pixel value is expressed in a binary number;
dividing the line of the Arabic characters into a plurality of line images;
defining a plurality of cells in one of the plurality of line images, wherein each of the plurality of cells comprises a group of adjacent pixels;
serializing pixel values of pixels in each of the plurality of cells in one of the plurality of line images to form a binary cell number;
forming a text feature vector according to binary cell numbers obtained from the plurality of cells in one of the plurality of line images; and
feeding the text feature vector into a Hidden Markov Model to recognize the line of Arabic characters.
0 Assignments
0 Petitions
Accused Products
Abstract
A method for automatically recognizing Arabic text includes digitizing a line of Arabic characters to form a two-dimensional array of pixels each associated with a pixel value, wherein the pixel value is expressed in a binary number, dividing the line of the Arabic characters into a plurality of line images, defining a plurality of cells in one of the plurality of line images, wherein each of the plurality of cells comprises a group of adjacent pixels, serializing pixel values of pixels in each of the plurality of cells in one of the plurality of line images to form a binary cell number, forming a text feature vector according to binary cell numbers obtained from the plurality of cells in one of the plurality of line images, and feeding the text feature vector into a Hidden Markov Model to recognize the line of Arabic characters.
17 Citations
21 Claims
-
1. A method for automatically recognizing Arabic text, comprising:
-
acquiring a text image containing a line of Arabic characters; digitizing the line of the Arabic characters to form a two-dimensional array of pixels each associated with a pixel value, wherein the pixel value is expressed in a binary number; dividing the line of the Arabic characters into a plurality of line images; defining a plurality of cells in one of the plurality of line images, wherein each of the plurality of cells comprises a group of adjacent pixels; serializing pixel values of pixels in each of the plurality of cells in one of the plurality of line images to form a binary cell number; forming a text feature vector according to binary cell numbers obtained from the plurality of cells in one of the plurality of line images; and feeding the text feature vector into a Hidden Markov Model to recognize the line of Arabic characters. - View Dependent Claims (2, 4, 5, 6, 7, 8, 9)
-
-
10. A method for automatically recognizing Arabic text, comprising:
-
acquiring a text image containing a line of Arabic characters; digitizing the line of the Arabic characters to form a two-dimensional array of pixels each associated with a pixel value; dividing the line of the Arabic characters into a plurality of line images; downsizing at least one of the plurality of line images to produce a downsized line image; serializing pixel values of pixels in each column of the downsized line image to form a string of serialized numbers, wherein the string of serialized numbers forms a text feature vector; and feeding the text feature vector into a Hidden Markov Model to recognize the line of Arabic characters. - View Dependent Claims (11, 12)
-
-
13. A method for automatically recognizing Arabic text, comprising:
-
acquiring a text image containing a line of Arabic characters; digitizing the line of the Arabic characters to form a two-dimensional array of pixels each associated with a pixel value expressed in a binary number, wherein the two-dimensional array of pixels comprises a plurality of rows in a first direction and a plurality of columns in a second direction; counting frequencies of consecutive pixels of a same pixel value in a column of pixels; forming a text feature vector using the frequency counts obtained from the column of pixels; and feeding the text feature vector into a Hidden Markov Model to recognize the line of Arabic characters. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A computer program product comprising a computer useable medium having computer readable program code functions embedded in said medium for causing a computer to:
-
acquire a text image containing a line of Arabic characters; digitize the line of the Arabic characters to form a two-dimensional array of pixels each associated with a pixel value, wherein the pixel value is expressed in a binary number; divide the line of the Arabic characters into a plurality of line images, wherein the two-dimensional array of pixels comprises a plurality of rows in a first direction and a plurality of columns in a second direction, wherein the line of Arabic characters is aligned substantially along the first direction, wherein the plurality of line images are sequentially aligned along the first direction and have a width between 2 and 100 columns of pixels; define a plurality of cells in one of the plurality of line images, wherein each of the plurality of cells comprises a group of adjacent pixels; serialize pixel values of pixels in each of the plurality of cells in one of the plurality of line images to form a binary cell number; form a text feature vector according to binary cell numbers obtained from the plurality of cells in one of the plurality of line images; and feed the text feature vector into a Hidden Markov Model to recognize the line of Arabic characters. - View Dependent Claims (20, 21)
-
Specification