Fast CJK character recognition
First Claim
1. A method for facilitating recognition of glyph-based characters in an electronic image, the electronic image including representations of glyph-based characters, the method comprising:
- identifying a line of glyph-based character representations in the electronic image;
isolating a plurality of glyph-based character representations from the line of glyph-based character representations;
loading a set of glyph-based character patterns into a computer memory;
loading the plurality of glyph-based character representations into the computer cache; and
recognizing the plurality of glyph-based character representations as a batch using the set of glyph-based character patterns of the computer memory while one or more of the plurality of character representations are in the computer cache.
4 Assignments
0 Petitions
Accused Products
Abstract
Methods are described for determining an optimal path for creating a scheme for dividing a text line of Chinese, Japanese or Korean (CJK) characters into character cells prior to applying classifiers and recognizing characters. Gaps between characters are found as a window is moved down the text line. Finding gaps may involve finding 4-connected paths. A histogram is built based on distances from start of window to a respective gap. The window is moved to the end of each gap after each gap is found and distances measured. Process is repeated until window reaches the end of the text line and all gaps found. A linear division graph (LDG) is constructed according to detected gaps. Penalties for certain distances are applied. An optimum path is one with a minimal penalty sum and can be used as a scheme for dividing text lines into character cells.
216 Citations
19 Claims
-
1. A method for facilitating recognition of glyph-based characters in an electronic image, the electronic image including representations of glyph-based characters, the method comprising:
-
identifying a line of glyph-based character representations in the electronic image; isolating a plurality of glyph-based character representations from the line of glyph-based character representations; loading a set of glyph-based character patterns into a computer memory; loading the plurality of glyph-based character representations into the computer cache; and recognizing the plurality of glyph-based character representations as a batch using the set of glyph-based character patterns of the computer memory while one or more of the plurality of character representations are in the computer cache. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A device for detecting the boundaries of characters in an electronic image, the electronic image including representations of characters, the device comprising:
-
a processor; and a memory configured with processor-executable instructions which, when executed by the processor, implement a method, the method comprising; identifying a line of glyph-based character representations in the electronic image; isolating a plurality of glyph-based character representations; loading a set of character patterns into the memory; loading the plurality of glyph-based character representations into the cache; and recognizing the plurality of glyph-based character representations with the set of character patterns while one or more of the plurality of glyph-based character representations are in the cache. - View Dependent Claims (13, 14, 15, 16)
-
-
17. One or more physical non-transitory computer accessible media encoded with instructions for performing a method, the method comprising:
-
identifying a line of glyph-based character representations in the electronic image; isolating a plurality of glyph-based character representations; loading a set of glyph-based character patterns into a computer memory; loading the plurality of glyph-based character representations into the computer cache; and recognizing the plurality of glyph-based character representations with the set of glyph-based character patterns while one or more of the plurality of glyph-based character representations are in the computer cache. - View Dependent Claims (18, 19)
-
Specification