Method and apparatus for character recognition accommodating diacritical marks
First Claim
1. A method of processing data for recognizing unknown characters of a known character set, some of the characters having diacritical marks associated therewith, said method comprising the steps of:
- storing the image data representing an entire unknown character, including any overlapping or non-overlapping diacritical marks associated therewith;
segmenting the stored image data to represent individual unknown characters including any diacritical mark associated therewith;
extracting from the stored image data that portion of the image data representing a predetermined localized area of the unknown character corresponding to the expected location of a diacritical mark;
classifying the segmented image data to provisionally distinguish larger characters which may include a diacritical mark from smaller characters which may not include a diacritical mark;
examining the extracted diacritical mark image data and at least a portion of the non-extracted stored image data of the unknown character with the provisional distinction between the larger and smaller characters to recognize the unknown character and any diacritical mark associated therewith.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus of processing data is disclosed for recognizing unknown characters of a known character set, some of the characters having diacritical marks. The method includes the steps of storing the image data of an unknown character which may contain a diacritical mark. From the stored image data a predetermined localized area of data is extracted that corresponds to the expected location of the diacritical mark. The extracted diacritical mark image data and at least a portion of the stored image data of the unknown character are examined to recognize the character and any diacritical mark associated therewith. Also disclosed are video preprocessing techniques for segmenting the characters using profiles thereof, inclusive-bit-coding to separate characters based upon differences in size, justification of the extracted diacritical mark image data, unique encoding of the recognition results, and post-processing verification for characters including diacritical marks.
174 Citations
18 Claims
-
1. A method of processing data for recognizing unknown characters of a known character set, some of the characters having diacritical marks associated therewith, said method comprising the steps of:
-
storing the image data representing an entire unknown character, including any overlapping or non-overlapping diacritical marks associated therewith; segmenting the stored image data to represent individual unknown characters including any diacritical mark associated therewith; extracting from the stored image data that portion of the image data representing a predetermined localized area of the unknown character corresponding to the expected location of a diacritical mark; classifying the segmented image data to provisionally distinguish larger characters which may include a diacritical mark from smaller characters which may not include a diacritical mark; examining the extracted diacritical mark image data and at least a portion of the non-extracted stored image data of the unknown character with the provisional distinction between the larger and smaller characters to recognize the unknown character and any diacritical mark associated therewith. - View Dependent Claims (2, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
3. The method of cliam 2 wherein the step of justifying the extracted diacritical mark image data comprises justifying said data in a direction away from the unknown character.
-
13. A method of processing data for recognizing unknown characters of a known character set, the unknown characters being represented by scan data representing an entire unknown character including any diacritical marks associated therewith, said method comprising the steps of;
-
storing the image data representing an entire unknown character, including any overlapping or non-overlapping diacritical marks associated therewith; generating from the stored image data profiles parallel to the reading line representing the unknown characters and any diacritical marks associated therewith, and separating the profiles into segments, each segment representing the relative width of an unknown character including any diacritical mark associated therewith; extracting from the stored unknown character image data that portion of the image data contained in a predetermined localized area of the unknown character corresponding to the expected location of a diacritical mark; classifying the segmented profiles and associated character data by generating an inclusive-bit-encoded word representative of the size of the character and testing a given bit in said word to provisionally distinguish larger characters which may include a diacritical mark from smaller characters which may not include a diacritical mark; justifying the extracted diacritical mark image data; examining the justified diacritical mark image data and at least a portion of the non-extracted stored image data of the unknown character with the provisional distinction between the larger and smaller characters to recognize the unknown character and any diacritical mark associated therewith. - View Dependent Claims (14, 15, 16, 17)
-
-
18. Apparatus for recognizing unknown characters of a known character set, some of the characters having diacritical marks associated therewith, said apparatus comprising:
-
means for storing the image data representing an entire unknown character including any overlapping or non-overlapping diacritical mark which may be associated therewith; means associated with said means for storing for segmenting the stored image data to represent individual unknown characters including any diacritical mark associated therewith; means associated with said means for storing the image of an entire unknown character for extracting from the stored image data that portion of the image data representing a predetermined localized area of the unknown character corresponding to the expected location of a diacritical mark; means associated with said means for segmenting for classifying the segmented image data to provisionally distinguish larger characters which may include a diacritical mark from smaller characters which may not include a diacritical mark; means associated with said means for extracting for examining the extracted diacritical mark image data and at least a portion of the non-extracted stored image data of the unknown character with the provisional distinction between the larger and smaller characters to recognize the unknown character and any diacritical mark associated therewith.
-
Specification