Radical definition and dictionary creation for a handwriting recognition system
First Claim
1. A method in a computer system for identifying radicals for a plurality of Kanji characters, each of the plurality of Kanji characters comprising at least one radical, the method comprising:
- receiving handwriting samples for each of the plurality of Kanji characters, each handwriting sample comprising at least one stroke;
grouping the handwriting samples for each of the plurality of Kanji characters by the number of strokes within each handwriting sample;
categorizing each stroke of each handwriting sample based on the shape of the stroke;
averaging the handwriting samples within each group that have the same categorization of strokes;
determining characteristics common to two or more of the averaged samples;
identifying the characteristics common to two or more of the averaged samples as radicals;
storing the identified radicals into the computer system; and
mapping each of the plurality of Kanji characters onto the at least one radical that comprises the Kanji character.
2 Assignments
0 Petitions
Accused Products
Abstract
The system described herein automatically defines a set of radicals to be used in a Kanji character handwriting recognition system and automatically creates a dictionary of the Kanji characters that are recognized by the system. In performing its functionality, the system described herein first obtains representative handwriting samples for each Kanji character that is to be recognized by the system. The system described herein then evaluates the samples to identify a set of subparts ("radicals") that are common to at least two of the Kanji characters. These radicals represent component roots from which the characters are formed. Each Kanji character is formed by one or more of these radicals. The radicals that are identified by the system described herein are not constrained to any preset definition (e.g., the traditional set of radicals used to organize Japanese dictionaries). Thus, the radicals utilized by the system described herein may include some of the traditional radicals or may include none of the traditional radicals. After identifying the set of radicals, the system described herein generates a dictionary with a mapping of each Kanji character that is to be recognized by the system to its component radicals. After the set of radicals and the dictionary have been created, these components can be utilized during handwriting recognition. When performing handwriting recognition, the system described herein identifies the radicals within the handwriting and then uses the mapping to determine which Kanji character the handwriting most closely matches.
93 Citations
24 Claims
-
1. A method in a computer system for identifying radicals for a plurality of Kanji characters, each of the plurality of Kanji characters comprising at least one radical, the method comprising:
-
receiving handwriting samples for each of the plurality of Kanji characters, each handwriting sample comprising at least one stroke; grouping the handwriting samples for each of the plurality of Kanji characters by the number of strokes within each handwriting sample; categorizing each stroke of each handwriting sample based on the shape of the stroke; averaging the handwriting samples within each group that have the same categorization of strokes; determining characteristics common to two or more of the averaged samples; identifying the characteristics common to two or more of the averaged samples as radicals; storing the identified radicals into the computer system; and mapping each of the plurality of Kanji characters onto the at least one radical that comprises the Kanji character.
-
-
2. A method in a computer system for identifying common sequences of components within a plurality of symbols, each symbol having a meaning and a number of components, each component having a feature and a visual representation, the method comprising:
-
receiving the plurality of symbols; for each meaning, creating a grouping of the symbols based on the number of components within each symbol; and for each grouping, creating a sub-grouping of symbols that have components with similar features; identifying sequences of components that have the same features and that are common to at least two sub-groupings; and for each of the identified sequences of components, clustering the sequences of components into clusters that are visually similar; and for each cluster, averaging the visual representation of the sequences of components, wherein each averaged visual representation of the sequences of components is an identified common sequence of components within the plurality of symbols. - View Dependent Claims (3, 4, 5, 6, 7, 8)
-
-
9. A method in a computer system for identifying common sub-parts within a plurality of symbols, each symbol having a meaning and a number of sub-parts, each sub-part having one or more components, each component having a visual representation, the method comprising:
-
receiving the plurality of symbols; identifying sequences of components within the plurality of symbols that have a visually similar representation and that are common to at least two symbols by comparing a sequence of components in one symbol to a sequence of components in another symbol by; sorting the received symbols based on the meaning of the received symbols and the number of components within the received symbols; averaging the symbols having the same meaning and the same number of components to create averaged symbols; and identifying sequences of components within the averaged symbols that have a visually similar representation and that are common to at least two of the averaged symbols; and for each identified sequence of components, creating a sub-part based on the visual representation of the components within the sequences of components. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-readable medium containing computer instructions for directing a computer to perform a method for identifying common sub-parts within a plurality of symbols, each symbol having a meaning and a number of sub-parts, each sub-part having one or more components, each component having a visual representation, the method comprising:
-
receiving the plurality of symbols; identifying sequences of components within the plurality of symbols that have a visually similar representation and that are common to at least two symbols by comparing a sequence of components in one symbol to a sequence of components in another symbol by; sorting the received symbols based on the meaning of the received symbols and the number of components within the received symbols; averaging the symbols having the same meaning and the same number of components to create averaged symbols; and identifying the sequences of components within the averaged symbols that have a visually similar representation and that are common to at least two of the averaged symbols; and for each identified sequence of components, creating a sub-part based on the visual representation of the components within the sequences of components. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification