Handwriting signal processing front-end for handwriting recognizers
First Claim
1. A front-end processing method for a handwriting recognition system, said method for processing strokes of handwriting training samples comprising a time series of (x,y) coordinates, said method comprising the steps of:
- segmenting said strokes based on interrelationships of said (x,y) coordinates into an ordered set of training stroke segments that are non-uniform in length for each of said handwriting training samples;
extracting a first plurality of feature values from each of said training stroke segments, wherein each of said feature values extracted therefrom forms entries of a word-independent training feature vector;
creating a series of feature-specific vectors by grouping said entries corresponding to one of said feature values from contiguous groups of said word-independent training feature vectors;
performing multiple vector quantization by vector quantizing each of said feature-specific vectors to statistically characterize said feature-specific vectors, wherein said vector quantizing includes;
partitioning said feature-specific vectors into a plurality of clusters, wherein each of said clusters includes a mean value and a distribution about said mean value for proximate ones of said feature-specific vectors, andlabelling each of said mean values in each of said clusters with a symbol; and
storing in a plurality of codebooks said mean values and said symbols for each of said clusters to effect a reduced representation of said handwriting training samples.
2 Assignments
0 Petitions
Accused Products
Abstract
A handwriting signal processing front-end method and apparatus for a handwriting training and recognition system which includes non-uniform segmentation and feature extraction in combination with multiple vector quantization. In a training phase, digitized handwriting samples are partitioned into segments of unequal length. Features are extracted from the segments and are grouped to form feature vectors for each segment. Groups of adjacent from feature vectors are then combined to form input frames. Feature-specific vectors are formed by grouping features of the same type from each of the feature vectors within a frame. Multiple vector quantization is then performed on each feature-specific vector to statistically model the distributions of the vectors for each feature by identifying clusters of the vectors and determining the mean locations of the vectors in the clusters. Each mean location is represented by a codebook symbol and this information is stored in a codebook for each feature. These codebooks are then used to train a recognition system. In the testing phase, where the recognition system is to identify handwriting, digitized test handwriting is first processed as in the training phase to generate feature-specific vectors from input frames. Multiple vector quantization is then performed on each feature-specific vector to represent the feature-specific vector using the codebook symbols that were generated for that feature during training. The resulting series of codebook symbols effects a reduced representation of the sampled handwriting data and is used for subsequent handwriting recognition.
-
Citations
14 Claims
-
1. A front-end processing method for a handwriting recognition system, said method for processing strokes of handwriting training samples comprising a time series of (x,y) coordinates, said method comprising the steps of:
-
segmenting said strokes based on interrelationships of said (x,y) coordinates into an ordered set of training stroke segments that are non-uniform in length for each of said handwriting training samples; extracting a first plurality of feature values from each of said training stroke segments, wherein each of said feature values extracted therefrom forms entries of a word-independent training feature vector; creating a series of feature-specific vectors by grouping said entries corresponding to one of said feature values from contiguous groups of said word-independent training feature vectors; performing multiple vector quantization by vector quantizing each of said feature-specific vectors to statistically characterize said feature-specific vectors, wherein said vector quantizing includes; partitioning said feature-specific vectors into a plurality of clusters, wherein each of said clusters includes a mean value and a distribution about said mean value for proximate ones of said feature-specific vectors, and labelling each of said mean values in each of said clusters with a symbol; and storing in a plurality of codebooks said mean values and said symbols for each of said clusters to effect a reduced representation of said handwriting training samples. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A front-end processing method for a handwriting recognition system, said method for processing strokes of handwriting test samples comprising a time series of (x,y) coordinates, wherein said handwriting recognition system has been trained from handwriting training samples including a plurality of feature-specific vectors and codebooks generated therefrom, wherein each of said codebooks is generated for each of said feature-specific vectors, said codebooks comprising mean values and symbols for a plurality of clusters, said method comprising the steps of:
-
segmenting said strokes based on interrelationships of said (x,y) coordinates into an ordered set of test stroke segments that are non-uniform in length for each of said handwriting test samples; extracting a first plurality of feature values from each of said test stroke segments, wherein each of said feature values extracted therefrom forms entries of a word-independent test feature vector; creating a series of test feature-specific vectors by grouping said entries corresponding to one of said feature values from contiguous groups of said word-independent test feature vectors; performing multiple vector quantization by vector quantizing each of said test feature-specific vectors to compare each of said test feature-specific vectors to said mean values in said codebooks, wherein said vector quantizing includes assigning to each of said test feature-specific vectors, one of said symbols of one of said mean values to which each of said test feature-specific vectors is closest in distance to form a series of output symbols; and using said output symbols to represent said handwriting test samples for recognition of said handwriting test samples. - View Dependent Claims (8, 9)
-
-
10. A front-end processing method for training a handwriting recognizer from a training set of sampled handwriting data comprising a time series of sample points in the form of (x,y) coordinates, said method including the steps of:
-
performing non-uniform segmentation on said series of sample points to partition said sample points into a first segment; extracting a first feature value and a second feature value from said first segment; creating a first word-independent vector from said first and second feature values; creating a first feature-specific vector from said first feature value in said first word-independent vector; creating a second feature-specific vector from said second feature value in said first word-independent vector; performing multiple vector quantization on said first word-independent vector by vector quantizing said first feature-specific vector to form a first cluster, and by vector quantizing said second feature-specific vector to form a second cluster, wherein each of said first and second clusters include a respective mean value and a distribution about said mean value; and storing said first cluster in a first codebook and storing said second cluster in a second codebook. - View Dependent Claims (11)
-
-
12. A front-end processing method for training a handwriting recognizer from a training set of sampled handwriting data comprising a series of sample points in the form of (x,y) coordinates, said method including the steps of:
-
performing non-uniform segmentation on said series of sample points to partition said sample points into a first and a second segment; extracting a first and a second feature value from said first segment, and extracting said first and second feature values from said second segment; creating a first word-independent vector from said first and second feature values extracted from said first segment wherein said first and second feature values form entries in said first vector; creating a second word-independent vector from said first and second feature values extracted from said second segment wherein said first and second feature values form entries in said second vector; combining said first word-independent vector and said word-independent second vector to create an input frame; creating a first feature-specific vector by grouping said entries from said first word-independent vector and said second word-independent vector in said input frame corresponding to said first feature value; creating a second feature-specific vector by grouping said entries from said first word-independent vector and said second word-independent vector corresponding to said second feature value; performing multiple vector quantization on said input frame by vector quantizing said first feature-specific vector to statistically model said first feature-specific vector to create a first codebook, and by vector quantizing said second feature-specific vector to statistically model said second feature-specific vector to form a second codebook; and training said handwriting recognizer using said first and second codebooks. - View Dependent Claims (13)
-
-
14. A front-end processing method for a handwriting recognizer that has been trained using training data represented as a plurality of codebooks to recognize sampled handwriting data in the form of a series of sample points as known strings of characters, each of said codebooks including a plurality of clusters identified by symbols, said method including the steps of:
-
performing non-uniform segmentation on said series of sample points to partition said sample points into a first and a second segment; extracting a first and a second feature value from said first segment, and extracting said first and second feature values from said second segment; creating a first word-independent vector from said first and second feature values extracted from said first segment wherein said first and second feature values form entries in said first word-independent vector; creating a second word-independent vector from said first and second feature values extracted from said second segment wherein said first and second feature values form entries in said second word-independent vector; combining said first word-independent vector and said second word-independent vector to create an input frame; creating a first feature-specific vector by grouping said entries from said first word-independent vector and said second word-independent vector in said input frame corresponding to said first feature value; creating a second feature-specific vector by grouping said entries from said first word-independent vector and said second word-independent vector corresponding to said second feature value; performing multiple vector quantization on said input frame by vector quantizating said first feature-specific vector to assign to said first feature-specific vector a first symbol of one of said clusters in one of said codebooks for which said first feature-specific vector is the closest, and by vector quantizating said second feature-specific vector to assign to said second feature-specific vector a second symbol of one of said clusters in one of said codebooks for which said second feature-specific vector is the closest; and sending said first and second symbols to said handwriting recognizer for recognition as said known strings of characters.
-
Specification