Radical definition and dictionary creation for a handwriting recognition system
First Claim
1. A method in a computer system for identifying characteristics of elements of grammar for use in recognizing elements of grammar of a natural language, comprising:
- receiving examples of the elements of grammar from users, each element of grammar having one or more characteristics;
mathematically combining selected elements of grammar to create combined elements of grammar;
identifying sets of characteristics within the combined elements of grammar that have a visually similar representation and that are common to at least two of the combined elements of grammar;
mapping each element of grammar to at least one of the identified sets of characteristics; and
receiving an element of grammar to be recognized, determining whether the received element of grammar has an identified set of characteristics, and when the received element has the identified set of characteristics, using the mapping of elements of grammar to the identified set of characteristics to recognize the received element of grammar to be recognized.
1 Assignment
0 Petitions
Accused Products
Abstract
The system described herein automatically defines a set of radicals to be used in a Kanji character handwriting recognition system and automatically creates a dictionary of the Kanji characters that are recognized by the system. In performing its functionality, the system described herein first obtains representative handwriting samples for each Kanji character that is to be recognized by the system. The system described herein then evaluates the samples to identify a set of subparts (“radicals”) that are common to at least two of the Kanji characters. These radicals represent component roots from which the characters are formed. Each Kanji character is formed by one or more of these radicals. The radicals that are identified by the system described herein are not constrained to any preset definition (e.g., the traditional set of radicals used to organize Japanese dictionaries). Thus, the radicals utilized by the system described herein may include some of the traditional radicals or may include none of the traditional radicals. After identifying the set of radicals, the system described herein generates a dictionary with a mapping of each Kanji character that is to be recognized by the system to its component radicals. After the set of radicals and the dictionary have been created, these components can be utilized during handwriting recognition. When performing handwriting recognition, the system described herein identifies the radicals within the handwriting and then uses the mapping to determine which Kanji character the handwriting most closely matches.
53 Citations
26 Claims
-
1. A method in a computer system for identifying characteristics of elements of grammar for use in recognizing elements of grammar of a natural language, comprising:
-
receiving examples of the elements of grammar from users, each element of grammar having one or more characteristics;
mathematically combining selected elements of grammar to create combined elements of grammar;
identifying sets of characteristics within the combined elements of grammar that have a visually similar representation and that are common to at least two of the combined elements of grammar;
mapping each element of grammar to at least one of the identified sets of characteristics; and
receiving an element of grammar to be recognized, determining whether the received element of grammar has an identified set of characteristics, and when the received element has the identified set of characteristics, using the mapping of elements of grammar to the identified set of characteristics to recognize the received element of grammar to be recognized. - View Dependent Claims (17, 18, 19, 20)
-
-
2. A method in a computer system for generating radicals of Kanji characters, the method comprising:
-
receiving sample handwriting data from at least one user comprising a plurality of Kanji characters, each of the plurality of Kanji characters being formed by a series of writing instrument strokes;
for at least some of the Kanji characters, identifying sets of strokes from different samples of the handwriting data that have a visually similar representation;
mathematically combining the sets of strokes from a plurality of samples that have visually similar representations to create combined strokes;
for each of the plurality of Kanji characters, identifying at least one radical, a radical being a common series of combined writing strokes used to form at least two of the plurality of Kanji characters; and
examining the sample handwriting data to automatically create a set of radicals from the sample handwriting data, the created set of radicals for use in recognizing subsequently received handwriting data as Kanji characters. - View Dependent Claims (3, 4, 5, 6, 21, 22, 23)
receiving handwriting user input indicating an intended Kanji character; and
comparing the handwriting user input to the set of radicals to recognize the intended Kanji character.
-
-
4. The method of claim 3 wherein the computer system has a display and wherein the method further comprises the step of displaying the intended Kanji character on the display.
-
5. The method of claim 2 further comprising:
-
creating a Kanji character dictionary containing the plurality of Kanji characters; and
storing a mapping in the Kanji character dictionary of each of the plurality of Kanji characters to at least one of the radicals that comprise the Kanji character.
-
-
6. The method of claim 5 further comprising:
-
receiving handwriting user input identifying an intended Kanji character;
identifying radicals within the handwriting user input by comparing the handwriting user input to the set of radicals; and
accessing the Kanji character dictionary with the identifying radicals to determine the intended Kanji character.
-
-
21. The method of claim 2, wherein mathematically combining the set of strokes comprises averaging the set of strokes.
-
22. The method of claim 2, wherein the set of strokes comprises a sequence of strokes.
-
23. A computer-readable medium containing computer instructions for performing the method recited in claim 2.
-
7. A computer system for recognizing Kanji characters comprising:
-
an analyzer component for receiving sample handwriting data from a plurality of users comprising a plurality of Kanji characters and for automatically defining a set of radicals from the sample handwriting data by mathematically combining sample handwriting data for the Kanji characters so as to create combined subparts, wherein each Kanji character comprises at least one radical and wherein each radical is a common combined subpart to at least two of the Kanji characters; and
a recognizer component for receiving handwriting user input indicating an intended Kanji character and for comparing be received handwriting user input to the defined set of radicals to determine the intended Kanji character. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A method in a computer system for identifying common subparts within a plurality of symbols, each symbol having a number of subparts, each subpart having one or more components, each component having a visual representation, the method comprising the computer-implemented steps of:
-
receiving the plurality of symbols;
identifying sets of components within the plurality of symbols that have a visually similar representation and that are common to at least two symbols by comparing a sequence of components in one symbol to a sequence of components in another symbol; and
for each identified sequence of components, mathematically combining the components and creating a subpart based on the visual representation of the combined components within the sets of components. - View Dependent Claims (14, 24, 25)
sorting the received symbols based on the meaning of the received symbols and the number of components within the received symbols;
averaging the symbols having the same meaning and the same number of components to create averaged symbols; and
identifying the sequences of components within the averaged symbols that have a visually similar representation and that are common to at least two of the averaged symbols.
-
-
24. The method of claim 13, wherein mathematically combining the components comprises averaging the components.
-
25. A computer-readable medium containing computer instructions for performing the method recited in claim 13.
-
15. A computer-readable medium containing computer instructions for directing a computer to perform a method for identifying common subparts within a plurality of symbols, the symbol having a number of subparts, each subpart having one or more components, each component having a visual representation, the method comprising the steps of:
-
receiving the plurality of symbols;
identifying sets of components within the plurality of symbols that have a visually similar representation and that are common to at least two symbols by comparing a sequence of components in one symbol to a sequence of components in another symbol; and
for each identified sequence of components, mathematically combining selected components so as to create a subpart based on the visual representation of the components within the sets of components. - View Dependent Claims (16, 26)
sorting the received symbols based on the meaning of the received symbols and the number of components within the received symbols;
averaging the symbols having the same meaning and the same number of components to create averaged symbols; and
identifying the sequences of components within the averaged symbols that have a visually similar representation and that are common to at least two of the averaged symbols.
-
-
26. The method of claim 15, wherein mathematically combining the components comprises averaging the components.
Specification