Character recognition method and apparatus which groups similar character patterns
First Claim
1. A character recognition method for use with an image of a document, comprising the steps of:
- extracting character patterns from the image;
detecting similarities between the character patterns;
grouping the character patterns into groups using the detecting similarities;
determining for each of said groups a corresponding character code using a character recognition process, the corresponding character code corresponding to features of the members of a corresponding group, after the step of grouping; and
assigning to each character pattern of each of said groups the character code which corresponds to the group of the character pattern, wherein each character pattern of one of the groups is assigned a same character code, and wherein the same character code is a recognized character code.
1 Assignment
0 Petitions
Accused Products
Abstract
A character recognition method and apparatus which assembles similar character patterns into groups. By performing character recognition for each character of the group and comparing the recognition results for the group, a single recognition result can be obtained for the entire group which is quite accurate. Alternatively, a representative pattern may be generated for the group and a single recognition processing performed on this pattern in order to reduce the amount of time necessary to perform the character recognition process. When the result of the character recognition does not strongly indicate a single recognition result, the probability of appearance of individual characters and/or groups of characters such as digrams or trigrams can be analyzed to obtain more accurate results.
-
Citations
25 Claims
-
1. A character recognition method for use with an image of a document, comprising the steps of:
-
extracting character patterns from the image;
detecting similarities between the character patterns;
grouping the character patterns into groups using the detecting similarities;
determining for each of said groups a corresponding character code using a character recognition process, the corresponding character code corresponding to features of the members of a corresponding group, after the step of grouping; and
assigning to each character pattern of each of said groups the character code which corresponds to the group of the character pattern, wherein each character pattern of one of the groups is assigned a same character code, and wherein the same character code is a recognized character code. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
determining, on an individual character pattern basis, a character code for each of said character patterns in said groups; and
assigning, on a group basis, one of said character codes of each group as the corresponding character code of the group.
-
-
3. A method according to claim 2, wherein said step of assigning on a group basis includes:
determining, on a group basis, the corresponding character code for each of said groups which has a highest probability of being a correct character code of the character patterns within the groups and assigning to each of said groups the corresponding character code thereof.
-
4. A method according to claim 3, wherein said step of determining on a group basis includes:
determining the corresponding character code which has the highest probability of being the correct character code by determining a distance between a reference pattern in a dictionary and each of the character patterns, on a group basis, and summing up the distances for each member of the group to character patterns of a plurality of character code nominees, on a group basis.
-
5. A method according to claim 1, wherein said determining step includes:
-
generating, on a group basis, a representative pattern corresponding to each of said groups; and
performing a character recognition process on each of said representative patterns to determine a character code for each of said representative patterns used as the corresponding character code of the group of the representative pattern.
-
-
6. A method according to claim 5, wherein said step of performing a character recognition process includes:
determining the character code for each of said representative patterns based on statistical properties of typical documents.
-
7. A method according to claim 6, wherein said step of performing a character recognition process includes:
-
determining if the probability of different character codes correspond to ones of the representative patterns are within a predetermined threshold; and
determining the character codes using statistical properties of typical documents when the representative patterns are within a predetermined threshold.
-
-
8. A method according to claim 7, wherein said step of determining the character codes using statistical probabilities includes:
determining the character codes using a table containing a probability of appearance of characters in the typical documents.
-
9. A method according to claim 8, wherein said step of determining the character codes using statistical properties includes:
comparing a frequency of appearance in said image of a character code nominee with a frequency of appearance of said character code nominee in said table.
-
10. A method according to claim 7, wherein said step of determining the character codes using statistical properties includes:
determining the character codes using a table containing a probability of appearance of consecutive characters in the typical documents.
-
11. A character recognition apparatus for use with an image of a document, comprising:
-
means for extracting character patterns from the image;
means for detecting similarities between the character patterns;
means for grouping the character patterns into groups using the detected similarities;
means for determining for each of said groups a corresponding character code using a character recognition process, the corresponding character code corresponding to features of the members of a corresponding group, after grouping the character patterns into groups; and
means for assigning to each character pattern of each of said groups the character code which corresponds to the groups of the character patterns, wherein each character pattern of one of the groups is assigned a same character code, and wherein the same character code is a recognized character code. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
means for determining, on an individual character pattern basis, a character code for each of said character patterns in said groups; and
means for assigning, on a group basis, one of said character codes of each group as the corresponding character code of the group.
-
-
13. An apparatus according to claim 12, wherein said means for assigning on a group basis includes:
means for determining, on a group basis, the corresponding character code for each of said groups which has a highest probability of being a correct character code of the character patterns within the groups and assigning to each of said groups the corresponding character code thereof.
-
14. An apparatus according to claim 13, wherein said means for determining on a group basis includes:
means for determining the corresponding character code which has the highest probability of being the correct character code by determining a distance between a reference pattern in a dictionary and each of the character patterns, on a group basis and summing up the distances for each member of the group to character patterns of a plurality of character code nominees, on a group basis.
-
15. An apparatus according to claim 11, wherein said means for determining includes:
-
means for generating, on a group basis, a representative pattern corresponding to each of said groups; and
means for performing a character recognition process on each of said representative patterns to determine a character code for each of said representative patterns used as the corresponding character code of the group of the representative pattern.
-
-
16. An apparatus according to claim 15, wherein said means for performing a character recognition includes:
means for determining the character code for each of said representative patterns based on statistical properties of typical documents.
-
17. An apparatus according to claim 16, wherein said means for performing a character recognition process includes:
-
means for determining if the probability of different character codes correspond to ones of the representative patterns are within a predetermined threshold; and
means for determining the character codes using statistical properties of typical documents when the representative patterns are within a predetermined threshold.
-
-
18. An apparatus according to claim 17, wherein said means for determining the character codes using statistical properties includes:
means for determining the character codes uses a table containing a probability of appearance of characters in the typical documents.
-
19. An apparatus according to claim 18, wherein said means for determining the character codes using statistical properties includes:
means for comparing a frequency of appearance in said image of a character code nominee with a frequency of appearance of said character code nominee in said table.
-
20. An apparatus according to claim 17, wherein said means for determining the character codes using statistical properties includes:
determining the character codes using a table containing a probability of appearance of consecutive characters in the typical documents.
-
21. A computer program product having a computer readable medium having computer program logic recorded thereon for preforming a character recognition process on an image of a document, comprising:
-
means for extracting character patterns from the image;
means for detecting similarities between the character patterns;
means for grouping the character patterns into groups using the detected similarities;
means for determining for each of said groups a corresponding character code using a character recognition process, the corresponding character code corresponding to features of the members of a corresponding group, after grouping the character patterns into groups; and
means for assigning to each character pattern of each of said groups the character code which corresponds to the groups of the character patterns, wherein each character pattern of one of the groups is assigned a same character code, and wherein the same character code is a recognized character code. - View Dependent Claims (22, 23, 24, 25)
means for determining, on an individual character pattern basis, a character code for each of said character patterns in said groups; and
means for assigning, on a group basis, one of said character codes of each group as the corresponding character code of the group.
-
-
23. A computer program product according to claim 22, wherein said means for assigning on a group basis includes:
means for determining, on a group basis, the corresponding character code for each of said groups which has a highest probability of being a correct character code of the character patterns within the groups and assigning to each of said groups the corresponding character code thereof.
-
24. A computer program product according to claim 23, wherein said means for determining on a group basis includes:
means for determining the corresponding character code which has the highest probability of being the correct character code by determining a distance between a reference pattern in a dictionary and each of the character patterns, on a group basis and summing up the distances for each member of the group to character patterns of a plurality of character code nominees, on a group basis.
-
25. A computer program product according to claim 21, wherein said means for determining includes:
-
means for generating, on a group basis, a representative pattern corresponding to each of said groups; and
means for performing a character recognition process on each of said representative patterns to determine a character code for each of said representative patterns used as the corresponding character code of the group of the representative pattern.
-
Specification