Automatic method for scoring and clustering prototypes of handwritten stroke-based data
First Claim
1. A method for processing digitized stroke-based handwriting data of known character strings, each segment of said known character strings being represented by a feature vector, said method comprising the steps of:
- determining a trajectory of said feature vectors in each of said known character strings corresponding to a particular character, an ith one of said trajectories Ti having n of said feature vectors, Ti ={P1i,P2i, . . . Pni }, and a jth one of said trajectories Tj having m of said feature vectors, Tj ={P1j, P2j, . . . Pmj };
determining a separation distance di,j between each pair of said trajectories Ti and Tj byforming a distance matrix Di,j where a (k,l) entry Di,j (k,l) of said distance matrix Di,j is equal to a distance between Pki, a kth one of said feature vectors of said trajectory Ti, and Plj, an lth one of said feature vectors of said trajectory Tj ;
determining an entry-to-entry path in said distance matrix Di,j from Di,j (1,1) to Di,j (n,m) such that a sum of entries along said entry-to-entry path is a minimum, and setting said sum equal to said separation distance di,j ; and
grouping said trajectories into clusters, such that said separation distance of a first pair of said trajectories in a first cluster is smaller than said separation distance of a second pair of said trajectories, said trajectories of said second pair being in different ones of said clusters.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for processing stroke-based handwriting data for the purposes of automatically scoring and clustering the handwritten data to form letter prototypes. The present invention includes a method for processing digitized stroke-based handwriting data of known character strings, where each of the character strings is represented by a plurality of mathematical feature vectors. In this method, each one of the plurality of feature vectors is labelled as corresponding to a particular character in the character strings. A trajectory is then formed for each one of the plurality of feature vectors labelled as corresponding to a particular character. After the trajectories are formed, a distance value is calculated for each pair of trajectories corresponding to the particular character using dynamic time warping method. The trajectories which are within a sufficiently small distance of each other are grouped to form a plurality of clusters. The clusters are used to define handwriting prototypes which identify subcategories of the character.
-
Citations
11 Claims
-
1. A method for processing digitized stroke-based handwriting data of known character strings, each segment of said known character strings being represented by a feature vector, said method comprising the steps of:
-
determining a trajectory of said feature vectors in each of said known character strings corresponding to a particular character, an ith one of said trajectories Ti having n of said feature vectors, Ti ={P1i,P2i, . . . Pni }, and a jth one of said trajectories Tj having m of said feature vectors, Tj ={P1j, P2j, . . . Pmj }; determining a separation distance di,j between each pair of said trajectories Ti and Tj by forming a distance matrix Di,j where a (k,l) entry Di,j (k,l) of said distance matrix Di,j is equal to a distance between Pki, a kth one of said feature vectors of said trajectory Ti, and Plj, an lth one of said feature vectors of said trajectory Tj ; determining an entry-to-entry path in said distance matrix Di,j from Di,j (1,1) to Di,j (n,m) such that a sum of entries along said entry-to-entry path is a minimum, and setting said sum equal to said separation distance di,j ; and grouping said trajectories into clusters, such that said separation distance of a first pair of said trajectories in a first cluster is smaller than said separation distance of a second pair of said trajectories, said trajectories of said second pair being in different ones of said clusters. - View Dependent Claims (2)
-
-
3. A method for processing digitized stroke-based handwriting data of known character strings, each of said character strings being represented by mathematical feature vectors, said method comprising the steps of:
-
labelling a subset of said plurality of feature vectors as corresponding to a particular character in said character strings; forming a trajectory for said each one of said plurality of feature vectors labelled as corresponding to said particular character in said character strings, thereby providing a first plurality of trajectories corresponding to every occurrence of said particular character in said handwriting data; calculating a distance value for each pair of said first plurality of trajectories using a dynamic time warping method, wherein said dynamic time warping step further including the steps of; determining a separation distance di,j between a pair of said first plurality of trajectories, Ti and Tj where Ti ={P1i, P2i, . . . Pni } and includes n of said feature vectors, Tj ={P1j, P2j, . . . Pmj } and includes m of said feature vectors;
byforming a distance matrix Di,j where a (k,l) entry Di,j (k,l) of said distance matrix Di,j is equal to a distance between Pki, a kth one of said feature vectors of said trajectory Ti, and P1j, an lth one of said feature vectors of said feature vectors of said trajectory Tj ; determining an entry-to-entry path in said distance matrix Di,j from Di,j (1,1) to Di,j (n,m) such that a sum of entries along said entry-to-entry path is a minimum, and setting said sum equal to said separation distance di,j ; grouping particular ones of said first plurality of trajectories having the closest ones of said distance values to form a plurality of clusters; successively merging said plurality of clusters to form larger clusters based on said distance values; and identifying subcategories of said particular character using said larger clusters. - View Dependent Claims (4, 5, 6, 7)
-
-
8. A method for processing digitized stroke-based handwriting data of known character strings, said handwriting data having (N) occurrences of a particular character, wherein each of said (N) occurrences is represented by a set of mathematical feature vectors, said method comprising the steps of:
-
creating a trajectory from each one of said sets of feature vectors; calculating a distance value between each pair of trajectories using dynamic time warping to provide a plurality of distance values, wherein said dynamic time warping step further including the steps of; determining a separation distance di,j between a pair of said first plurality of trajectories, Ti and Tj where Ti ={P1i, P2i, . . . Pni } and includes n of said feature vectors, Tj ={P1j, P2j, . . . Pmj } and includes m of said feature vectors;
byforming a distance matrix Di,j where a (k,l) entry Di,j (k,l) of said distance matrix Di,j is equal to a distance between Pki, a kth one of said feature vectors of said trajectory Ti, and P1j, an lth one of said feature vectors of said trajectory Tj ; determining an entry-to-entry path in said distance matrix Di,j from Di,j (1,1) to Di,j (n,m) such that a sum of entries along said entry-to-entry path is a minimum, and setting said sum equal to said separation distance di,j ; grouping a first plurality of said trajectories into a first cluster; grouping a second plurality of said trajectories into a second cluster; for a first one of said trajectories, calculating a first average distance value between said first one of said trajectories and said first cluster by averaging each of said plurality of distance values between said first trajectory and each of said first plurality of said trajectories; calculating a second average distance value between said first one of said trajectories and said second cluster by averaging each of said plurality of distance values between said first trajectory and each of said second plurality of said trajectories; assigning said first one of said trajectories to said first cluster if said first average distance value is less than said second average distance value, and assigning said first one of said trajectories to said second cluster if said second average distance value is less than said first average distance value; and defining a first and second prototype corresponding to said first and second clusters to represent similar occurrences of said particular character.
-
-
9. A stroke-based handwriting data processing system, wherein said stroke-based handwriting data contains a known character string, said system comprising:
-
signal processing means for generating segments from said stroke-based handwriting data from a plurality of samples, said signal processing means including means for characterizing each of said segments from each of said samples as a feature vector; alignment means for labelling each of said feature vectors as corresponding to a particular character in said character string; trajectory formation means for forming a trajectory for each of said feature vectors labelled as corresponding to a particular character in said character string; scoring means for calculating a similarity score between each pair of said trajectories, wherein said scoring means includes; dynamic time warping means for determining a separation distance di,j between a pair of said first plurality of trajectories Ti and Tj where Ti ={P1i, P2i, . . . Pni } and includes n of said feature vectors, and Tj ={P1j, P2j, . . . Pmj } and includes m of said feature vectors; said dynamic time warping means including; matrix means for forming a distance matrix Di,j where a (k,l) entry Di,j (k,l) of said distance matrix Di,j is equal to a distance between Pki, a kth one of said feature vectors of said trajectory Ti, and P1j, an lth one of said feature vectors of said trajectory Tj ; calculation means for calculating an entry-to-entry path in said distance matrix Di,j from Di,j (1,1) to Di,j (n,m) such that a sum of entries along said entry-to-entry path is a minimum, and setting said separation distance di,j equal to said sum; and clustering means for grouping said trajectories into a plurality of clusters according to said similarity scores. - View Dependent Claims (10, 11)
-
Specification