Apparatus and method for augmenting data in handwriting recognition system
First Claim
1. A method for generating training data for a recognition system, comprising the steps of:
- establishing a plurality of data classes;
collecting samples of user-generic data for each of the data classes;
dividing the samples of user-generic data of a data class into one or more data subclasses, wherein each data subclass represents a different form of the data class;
collecting samples of user-specific data and associating the samples of user-specific data with corresponding ones of the data classes; and
augmenting user-specific data for a given data class if necessary, with samples of user-generic data associated with the given data class and corresponding data subclasses, wherein the augmented user-specific data comprises training data for the recognition system.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus and method for providing improved data classification and, in particular, an apparatus and method for improved handwriting data recognition which enables handwriting recognition devices to robustly handle and recover from the problems associated with the omission of characters from collected handwriting samples. In one aspect, a data classification apparatus comprises: means for inputting a plurality of data, the plurality of data including one of data to be recognized, generic data and user-specific data; means for augmenting the user-specific data with the generic data to generate augmented user-specific data; means for training the data classification apparatus with the augmented user-specific data to generate training data; and means for recognizing the data to be recognized in accordance with the training data.
-
Citations
26 Claims
-
1. A method for generating training data for a recognition system, comprising the steps of:
-
establishing a plurality of data classes;
collecting samples of user-generic data for each of the data classes;
dividing the samples of user-generic data of a data class into one or more data subclasses, wherein each data subclass represents a different form of the data class;
collecting samples of user-specific data and associating the samples of user-specific data with corresponding ones of the data classes; and
augmenting user-specific data for a given data class if necessary, with samples of user-generic data associated with the given data class and corresponding data subclasses, wherein the augmented user-specific data comprises training data for the recognition system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
for each data class computing a sample count threshold value based on a number of samples of user-generic data associated with the data class; and
storing the sample count threshold values.
-
-
7. The method of claim 5, wherein the step of computing threshold data comprises the steps of:
-
computing a sample count threshold value that is applied to all of the data classes; and
storing the sample count threshold value.
-
-
8. The method of claim 5, wherein the step of augmenting comprises the steps of:
-
selecting a data class and retrieving the sample count threshold value corresponding to the selected data class;
comparing the corresponding sample count threshold value with a number of samples of user-specific data associated with the selected data class; and
adding samples of user-generic data associated with the selected data class to the user-specific data for the selected data class if the number of samples of user-specific data for the selected data class falls below the corresponding sample count threshold value.
-
-
9. The method of claim 8, wherein a number of added samples of user-generic data is equal to the difference between the corresponding sample count threshold value and the number of samples of user-specific data for the selected data class.
-
10. The method of claim 9, wherein a number of added samples of user-generic data is equal to the corresponding sample count threshold value.
-
11. The method of claim 8, wherein the step of adding comprises the steps of:
-
determining if samples of user-specific data were collected for the selected data class;
identifying a data subclass of the selected data class that comprises user-generic data which is most similar to the user-specific data of the selected data class, if samples of user-specific data Were collected for the selected data class; and
selecting samples of user-generic data associated with the identified data subclass.
-
-
12. The method of claim 11, further comprising the step of computing frequency values for data subclasses of each data class wherein the frequency value of a given data subclass represents a ratio of a number of collected samples of user-generic data for the given data subclass to a total number of collected samples of user-generic for the corresponding data class, wherein the step of adding further comprises the step of selecting samples of user-generic data associated with the selected data class based on frequency values of the subclasses of the selected data class if samples of user-specific data were not collected for the selected data class.
-
13. The method of claim 12, wherein the step of selecting samples of user-generic data based on subclass frequency values comprises selecting samples of user-generic data from a data subclass of the selected data class having a greatest frequency value corresponding thereto.
-
14. The method of claim 12, wherein the step of selecting samples of user-gencric data based on frequency values comprises selecting samples of user-generic data from each of the data subclasses of the selected character class in proportion to the frequency values of the data subclasses.
-
15. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for generating training data for a recognition system the method steps comprising:
-
establishing a plurality of data classes;
collecting samples of user-generic data for each of the data classes;
dividing the samples of user-generic data of a data class into one or more data subclasses, wherein each data subclass represents a different form of the data class;
collecting samples of user-specific data and associating the samples of user-specific data with corresponding ones of the data classes; and
augmenting user-specific data for a given data class, if necessary, with samples of user-generic data associated with the given data class and corresponding data subclasses, wherein the augmented user-specific data comprises training data for the recognition system. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
for each data class, computing a sample count threshold value based on a number of samples of user-generic data associated with the data class; and
storing the sample count threshold values.
-
-
18. The program storage device of claim 16, wherein the instructions for the step of computing threshold data comprise instructions for the steps of:
-
computing a sample count threshold value that is applied to all of the data classes; and
storing the sample Count threshold value.
-
-
19. The program storage device of claim 16, wherein the instructions for the step of augmenting comprise instructions for the steps of:
-
selecting a data class and retrieving the sample count threshold value corresponding to the selected data class;
comparing the corresponding sample count threshold value with a number of samples of user-specific data associated with the selected data class; and
adding samples of user-generic data associated with the selected data class to the user-specific data for the selected data class, if the number of samples of user-specific data for the selected data class falls below the corresponding sample count threshold value.
-
-
20. The program storage device of claim 19, wherein a number of added samples of user-generic data is equal to the difference between the corresponding sample count threshold value and the number of samples of user-specific data for the selected data class.
-
21. The program storage device of claim 19, wherein a number of added samples of user-generic data is equal to the corresponding sample count threshold value.
-
22. The program storage device of claim 19, wherein the instructions for the step of adding comprise instructions for the steps of:
-
determining if samples of user-specific data were collected for the selected data class;
identifying a data subclass of the selected data class that comprises user-generic data which is most similar to the user-specific data of the selected data class, it samples of user-specific data where collected for the selected data class; and
selecting samples of user-generic data associated with the identified data subclass.
-
-
23. The program storage device of claim 22, further comprising instructions for the step of computing frequency values for data subclasses of each data class wherein the frequency value of a given data subclass represents a ratio of a number of collected samples of user-generic data for the given data subclass to a total number of collected samples of user-generic for the corresponding data classwherein the instructions for the step of adding further comprise instructions for the step of selecting samples of user-generic data associated with the selected data class based on frequency values of the subclasses of the selected data class if samples of user-specific data here not collected for the selected data class.
-
24. The program storage device of claim 23, wherein the instructions for the step of selecting samples of user-generic data based on subclass frequency values comprise instructions for the step of selecting samples of user-generic data from a data subclass of the selected data class having a greatest frequency value corresponding thereto.
-
25. The program storage device of claim 23, wherein the instructions for the step of selecting samples of user-generic data based on frequency values comprise instructions for the step of selecting samples of user-generic data from each of the data subclasses of the selected character class in proportion to the frequency values of the data subclasses.
-
26. A data recognition apparatus, comprising:
-
an input device for collecting samples of user-generic data and samples of user-specific data;
a storage medium for storing the samples of user-generic data and user-specific wherein the stored samples of user-generic data are assigned into a plurality of data classes, wherein the samples of user-generic data of a data class are further divided into one or more data subclasses each representing a different form of the data class, and wherein the samples user-specific data are associated with corresponding ones of the data classes;
an augmentation unit adapted to augment user-specific data for a given data class, if necessary, with samples of user-generic data associated with the given data class and corresponding data subclasses; and
a training unit adapted to train the data recognition system using the augmented user-specific data.
-
Specification