Cursive character handwriting recognition system and method
First Claim
1. A method for recognizing unconstrained cursive handwritten words, comprising:
- processing an image of a handwritten word of one or more characters, the processing step including segmenting the imaged word into a set of one or more segments and determining a sequence of segments using an over-segmentation-relabeling algorithm;
extracting feature information of one segment or a combination of several consecutive segments;
repeating said extracting step until feature information from segments or combinations thereof have been extracted; and
classifying the imaged word as having a string of one or more characters using the extracted feature information,wherein the segmenting the imaged word includes locating a first segment and a last segment in the imaged word, and wherein the determining a sequence of segments using an over-segmentation-relabeling algorithm includes;
characterizing segments as either situated segments or unsituated segments, wherein situated segments include the first and last segments, segments having an X-coordinate or Y-coordinate coverage that exceed a threshold value, and small segments that are cursively connected to segments on each side, and wherein unsituated segments are segments not characterized as situated segments; and
placing each unsituated segment having a situated segment above or below so as to either immediately precede or follow the situated segment in the sequence of segments.
1 Assignment
0 Petitions
Accused Products
Abstract
A cursive character handwriting recognition system includes image processing means for processing an image of a handwritten word of one or more characters and classification means for determining an optimal string of one or more characters as composing the imaged word. The processing means segments the characters such that each character is made up of one or more segments and determines a sequence of the segments using an over-segmentation-relabeling algorithm. The system also includes feature extraction means for deriving a feature vector to represent feature information of one segment or a combination of several consecutive segments. The over-segmentation-relabeling algorithm places certain segments considered as diacritics or small segments so as to immediately precede or follow a segment of the associated main character body. Additionally, the system also includes classification means that processes each string of segments and outputs a number of optimal strings which could be matched against a given lexicon.
26 Citations
42 Claims
-
1. A method for recognizing unconstrained cursive handwritten words, comprising:
-
processing an image of a handwritten word of one or more characters, the processing step including segmenting the imaged word into a set of one or more segments and determining a sequence of segments using an over-segmentation-relabeling algorithm; extracting feature information of one segment or a combination of several consecutive segments; repeating said extracting step until feature information from segments or combinations thereof have been extracted; and classifying the imaged word as having a string of one or more characters using the extracted feature information, wherein the segmenting the imaged word includes locating a first segment and a last segment in the imaged word, and wherein the determining a sequence of segments using an over-segmentation-relabeling algorithm includes; characterizing segments as either situated segments or unsituated segments, wherein situated segments include the first and last segments, segments having an X-coordinate or Y-coordinate coverage that exceed a threshold value, and small segments that are cursively connected to segments on each side, and wherein unsituated segments are segments not characterized as situated segments; and placing each unsituated segment having a situated segment above or below so as to either immediately precede or follow the situated segment in the sequence of segments. - View Dependent Claims (2, 3, 4)
-
-
5. An unconstrained cursive character handwritten word recognition system, comprising a processor including:
-
an image processing module operable to process an image of a handwritten word of one or more characters, wherein the processing of the imaged word includes segmenting the imaged word into a finite number of segments and determining a sequence of the segments using an over-segmentation-relabeling algorithm, wherein each character includes one or more consecutive segments; a feature extraction module operable to derive a feature vector to represent feature information of one segment or a combination of several consecutive segments; and a classification module operable to determine an optimal string of one or more characters as composing the imaged word, wherein the classification module uses a continuous-discrete hybrid probability modeling of features toy determine a final symbol probability of whether a given feature vector is indicative of a given distinct character, wherein, in the continuous-discrete hybrid probability modeling of N features, the features N are separated into a first group N1 and a second group N2, features of the first group N1 are distributed using a continuous probability model to obtain a continuous distribution probability measure, features of the second group N2, are distributed using a discrete probability model given by equation (1) - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method for training an unconstrained cursive character handwritten word recognition system, comprising:
-
processing a corpus of handwritten word images, each imaged word having one or more characters, the processing step including segmenting each of the imaged words into a set of one or more segments and determining a sequence of the segments using an over-segmentation-relabeling algorithm; extracting feature information of individual characters of the imaged words; estimating symbol probability parameters associated with each distinct character so as to allow a statistical measure that given feature information is indicative of a distinct character; and estimating state duration probabilities associated with each distinct character, wherein a state duration probability of a given distinct character represents a probability that a segmented image of the given character will have a duration of a defined number of segments, wherein the segmenting each imaged word includes locating a first segment and a last segment in the imaged word, wherein the determining a sequence of segments using an over-segmentation-relabeling algorithm includes; characterizing segments as either situated segments or unsituated segments, wherein situated segments include the first and last segments, segments having an X-coordinate or Y-coordinate coverage that exceed a threshold value, and small segments that are cursively connected to segments on each side, and wherein unsituated segments are segments not characterized as situated segments; and placing each unsituated segment having a situated segment above or below so as to either immediately precede or follow the situated segment in the sequence of segments; - View Dependent Claims (21, 22)
-
-
23. A method for determining a sequence of segments of a segmented image of a cursive written word processed in a word recognition system, comprising:
-
finding the number of segments, wherein the finding step includes locating a first segment and a last segment in the imaged word; and determining the sequence of segments using an over-segmentation-relabeling algorithm, wherein the over-segmentation-relabeling algorithm includes; characterizing segments as either situated segments or unsituated segments, wherein situated segments include the first and last segments, segments having an X-coordinate or Y-coordinate coverage that exceed a threshold value, and small segments that are cursively connected to segments on each side, and wherein unsituated segments are segments not characterized as situated segments; and placing each unsituated segment having a situated segment above or below so as to either immediately precede or follow the situated segment in the sequence of segments. - View Dependent Claims (24, 25, 26, 27, 28)
-
-
29. A method for recognizing unconstrained cursive handwritten words, comprising:
-
processing an image of a handwritten Arabic word of one or more characters, the processing step including segmenting the imaged word into a set of one or more segments and determining a sequence of segments using an over-segmentation-relabeling algorithm; after the processing step, extracting feature information from one segment or a combination of several consecutive segments of the image word processed in the processing step, wherein the feature information includes at least one of aspect ratio features, location and number of disconnected dots, stroke connectedness features and chain code features, wherein the aspect ratio features include two aspect ratio features, ƒ
hv and ƒ
vh, which are computed by finding maximum vertical extent (vd) and maximum horizontal extent (hd) of the character, wherein feature ƒ
hv is based on a horizontal to vertical aspect ratio, and feature ƒ
vh is based on a vertical to horizontal aspect ratio,wherein the location and number of disconnected dots includes three features, ƒ
du, ƒ
dm, and ƒ
dl relating to the number of disconnected diacritics located in an upper zone, a middle zone, and a lower zone, respectively, of the one segment or the combination of several consecutive segments,wherein the stroke connectedness features include two stroke connectedness features, ƒ
cr and ƒ
cl,wherein the chain code features include three 8-directional chain code based features, ƒ
ch, ƒ
rough, and ƒ
con;repeating said extracting step until feature information from segments or combinations thereof have been extracted; and classifying the imaged word as having a string of one or more characters using the extracted feature information. - View Dependent Claims (30, 31, 32, 33, 34, 35)
-
-
36. An unconstrained cursive character handwritten word recognition system, comprising a processor including:
-
an image processing module operable to process an image of a handwritten word of one or more characters, wherein the processing of the imaged word includes segmenting the imaged word into a finite number of segments and determining a sequence of the segments using an over-segmentation-relabeling algorithm, wherein each character includes one or more consecutive segments; a feature extraction module operable to derive a feature vector to represent feature information of one segment or a combination of several consecutive segments; a classification module operable to determine an optimal string of one or more characters as composing the imaged word; and
;a post-processing module operable to output hypotheses of one or more words from a given dictionary which are suggested by the optimal string, wherein the hypotheses are based on weighted edit distances using weight factors determined from linguistic reasoning and empirically derived mutual character confusion information and without using probabilities given by a modified Viterbi algorithm as a weight factor. - View Dependent Claims (37, 38)
-
-
39. A non-transitory computer-readable medium having stored thereon, computer readable program code that, if executed by a system, cause the system to perform a method for recognizing unconstrained cursive handwritten words, the method comprising:
-
processing an image of a handwritten word of one or more characters, the processing step including segmenting the imaged word into a set of one or more segments and determining a sequence of segments using an over-segmentation-relabeling algorithm; extracting feature information of one segment or a combination of several consecutive segments; repeating said extracting step until feature information from all segments or combinations thereof have been extracted; and classifying the imaged word as having a string of one or more characters using the extracted feature information, wherein the segmenting the imaged word includes locating a first segment and a last segment in the imaged word, and wherein the determining a sequence of segments using an over-segmentation-relabeling algorithm includes; characterizing segments as either situated segments or unsituated segments, wherein situated segments include the first and last segments, segments having an X-coordinate or Y-coordinate coverage that exceed a threshold value, and small segments that are cursively connected to segments on each side, and wherein unsituated segments are segments not characterized as situated segments; and placing each unsituated segment having a situated segment above or below so as to either immediately precede or follow the situated segment in the sequence of segments. - View Dependent Claims (40, 41)
-
-
42. A non-transitory computer-readable medium having stored thereon, computer readable program code that, if executed by a system, cause the system to perform a method for training an unconstrained cursive character handwritten word recognition system, the method comprising:
-
processing a corpus of handwritten word images, each imaged word having one or more characters, the processing step including segmenting each of the imaged words into a set of one or more segments and determining a sequence of the segments using an over-segmentation-relabeling algorithm; extracting feature information of segments where one or more consecutive segments represent individual characters of the imaged words; estimating symbol probability parameters associated with each distinct character so as to allow a statistical measure that given feature information is indicative of a distinct character; and estimating state duration probabilities associated with each distinct character, wherein a state duration probability of a given distinct character represents a probability that a segmented image of the given character will have a duration of a defined number of segments, wherein the segmenting each imaged word includes locating a first segment and a last segment in the imaged word, wherein the determining a sequence of segments using an over-segmentation-relabeling algorithm includes; characterizing segments as either situated segments or unsituated segments, wherein situated segments include the first and last segments, segments having an X-coordinate or Y-coordinate coverage that exceed a threshold value, and small segments that are cursively connected to segments on each side, and wherein unsituated segments are segments not characterized as situated segments; and placing each unsituated segment having a situated segment above or below so as to either immediately precede or follow the situated segment in the sequence of segments.
-
Specification