CORRECTING SEGMENTATION ERRORS IN OCR
1 Assignment
0 Petitions
Accused Products
Abstract
A method for encoding characters includes identifying one or more sequences of the character codes that are likely to be generated due a segmentation error in application of a pattern recognition process, and associating a respective extension character code with each of the sequences. The area of an image containing characters is divided into segments, such that each segment contains approximately one character. The pattern recognition process is applied to each of the segments in order to generate an input string of character codes. At least one of the identified sequences of the character codes in the input string is replaced with the respective extension character code so as to generate a modified string. The output string is determined by comparing the modified string to a directory of known strings.
15 Citations
24 Claims
-
1-3. -3. (canceled)
-
4. A method for encoding characters appearing in an area of an image in order to generate a corresponding output string of character codes, the method comprising:
-
identifying one or more sequences of the character codes that are likely to be generated due a segmentation error in application of a pattern recognition process, and associating a respective extension character code with each of the sequences; dividing the area of the image into segments such that each segment contains approximately one character; applying the pattern recognition process to each of the segments in order to generate an input string of character codes, the input string comprising a respective character code for each of the segments; locating at least one of the sequences of the character codes in the input string, and replacing the at least one of the sequences with the respective extension character code so as to generate a modified string; and determining the output string by comparing the modified string to a directory of known strings, wherein determining the output string comprises finding an approximate match between the modified string and one of the known strings, and outputting the one of the known strings. - View Dependent Claims (5, 6, 7)
-
-
8. (canceled)
-
9. Apparatus for encoding characters appearing in an area of an image in order to generate a corresponding output string of character codes, the apparatus comprising:
-
a memory, which is arranged to hold a directory of known strings; and at least one processor, which is arranged to receive an identification of one or more sequences of the character codes that are likely to be generated due a segmentation error in application of a pattern recognition process, and to associate a respective extension character code with each of the sequences, and which is further arranged to divide the area of the image into segments such that each segment contains approximately one character, to apply the pattern recognition process to each of the segments in order to generate an input string of character codes, the input string comprising a respective character code for each of the segments, to locate at least one of the sequences of the character codes in the input string, and to replace the at least one of the sequences with the respective extension character code so as to generate a modified string, and to determine the output string by comparing the modified string to the known strings in the directory. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
- 17. A computer software product for encoding characters appearing in an area of an image to generate a corresponding output string of character codes, the product comprising a computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to receive an identification of one or more sequences of the character codes that are likely to be generated due a segmentation error in application of a pattern recognition process, and to associate a respective extension character code with each of the sequences, and further cause the computer to divide the area of the image into segments such that each segment contains approximately one character, to apply the pattern recognition process to each of the segments in order to generate an input string of character codes, the input string comprising a respective character code for each of the segments, to locate at least one of the sequences of the character codes in the input string, and to replace the at least one of the sequences with the respective extension character code so as to generate a modified string, and to determine the output string by comparing the modified string to a directory of known strings.
Specification