Automatic method for scoring and clustering prototypes of handwritten stroke-based data

US 6,052,481 A
Filed: 09/02/1994
Issued: 04/18/2000
Est. Priority Date: 09/02/1994
Status: Expired due to Term

First Claim

Patent Images

1. A method for processing digitized stroke-based handwriting data of known character strings, each segment of said known character strings being represented by a feature vector, said method comprising the steps of:

determining a trajectory of said feature vectors in each of said known character strings corresponding to a particular character, an ith one of said trajectories T_i having n of said feature vectors, T_i ={P₁ⁱ,P₂ⁱ, . . . P_nⁱ }, and a jth one of said trajectories T_j having m of said feature vectors, T_j ={P₁^j, P₂^j, . . . P_m^j };

determining a separation distance d_i,j between each pair of said trajectories T_i and T_j byforming a distance matrix D_i,j where a (k,l) entry D_i,j (k,l) of said distance matrix D_i,j is equal to a distance between P_kⁱ, a kth one of said feature vectors of said trajectory T_i, and P_l^j, an lth one of said feature vectors of said trajectory T_j ;

determining an entry-to-entry path in said distance matrix D_i,j from D_i,j (1,1) to D_i,j (n,m) such that a sum of entries along said entry-to-entry path is a minimum, and setting said sum equal to said separation distance d_i,j ; and

grouping said trajectories into clusters, such that said separation distance of a first pair of said trajectories in a first cluster is smaller than said separation distance of a second pair of said trajectories, said trajectories of said second pair being in different ones of said clusters.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for processing stroke-based handwriting data for the purposes of automatically scoring and clustering the handwritten data to form letter prototypes. The present invention includes a method for processing digitized stroke-based handwriting data of known character strings, where each of the character strings is represented by a plurality of mathematical feature vectors. In this method, each one of the plurality of feature vectors is labelled as corresponding to a particular character in the character strings. A trajectory is then formed for each one of the plurality of feature vectors labelled as corresponding to a particular character. After the trajectories are formed, a distance value is calculated for each pair of trajectories corresponding to the particular character using dynamic time warping method. The trajectories which are within a sufficiently small distance of each other are grouped to form a plurality of clusters. The clusters are used to define handwriting prototypes which identify subcategories of the character.

Citations

11 Claims

1. A method for processing digitized stroke-based handwriting data of known character strings, each segment of said known character strings being represented by a feature vector, said method comprising the steps of:
- determining a trajectory of said feature vectors in each of said known character strings corresponding to a particular character, an ith one of said trajectories T_i having n of said feature vectors, T_i ={P₁ⁱ,P₂ⁱ, . . . P_nⁱ }, and a jth one of said trajectories T_j having m of said feature vectors, T_j ={P₁^j, P₂^j, . . . P_m^j };
  
  determining a separation distance d_i,j between each pair of said trajectories T_i and T_j byforming a distance matrix D_i,j where a (k,l) entry D_i,j (k,l) of said distance matrix D_i,j is equal to a distance between P_kⁱ, a kth one of said feature vectors of said trajectory T_i, and P_l^j, an lth one of said feature vectors of said trajectory T_j ;
  
  determining an entry-to-entry path in said distance matrix D_i,j from D_i,j (1,1) to D_i,j (n,m) such that a sum of entries along said entry-to-entry path is a minimum, and setting said sum equal to said separation distance d_i,j ; and
  
  grouping said trajectories into clusters, such that said separation distance of a first pair of said trajectories in a first cluster is smaller than said separation distance of a second pair of said trajectories, said trajectories of said second pair being in different ones of said clusters.
- View Dependent Claims (2)
- - 2. The method of claim 1 wherein a first trajectory is included in a first cluster when a first average of said separation distances between said first trajectory and all of said trajectories in said first cluster is smaller than a second average of said separation distances between said first trajectory and all of said trajectories in a second clusters.

3. A method for processing digitized stroke-based handwriting data of known character strings, each of said character strings being represented by mathematical feature vectors, said method comprising the steps of:
- labelling a subset of said plurality of feature vectors as corresponding to a particular character in said character strings;
  
  forming a trajectory for said each one of said plurality of feature vectors labelled as corresponding to said particular character in said character strings, thereby providing a first plurality of trajectories corresponding to every occurrence of said particular character in said handwriting data;
  
  calculating a distance value for each pair of said first plurality of trajectories using a dynamic time warping method, wherein said dynamic time warping step further including the steps of;
  
  determining a separation distance d_i,j between a pair of said first plurality of trajectories, T_i and T_j where T_i ={P₁ⁱ, P₂ⁱ, . . . P_nⁱ } and includes n of said feature vectors, T_j ={P₁^j, P₂^j, . . . P_m^j } and includes m of said feature vectors;
  
  byforming a distance matrix D_i,j where a (k,l) entry D_i,j (k,l) of said distance matrix D_i,j is equal to a distance between P_kⁱ, a kth one of said feature vectors of said trajectory T_i, and P₁^j, an lth one of said feature vectors of said feature vectors of said trajectory T_j ;
  
  determining an entry-to-entry path in said distance matrix D_i,j from D_i,j (1,1) to D_i,j (n,m) such that a sum of entries along said entry-to-entry path is a minimum, and setting said sum equal to said separation distance d_i,j ;
  
  grouping particular ones of said first plurality of trajectories having the closest ones of said distance values to form a plurality of clusters;
  
  successively merging said plurality of clusters to form larger clusters based on said distance values; and
  
  identifying subcategories of said particular character using said larger clusters.
- View Dependent Claims (4, 5, 6, 7)
- - 4. A method as in claim 3 further including the steps of:
    - characterizing each of said larger clusters as handwriting prototypes; and
      
      providing a handwriting recognizer based on said statistics generated from said prototypes.
  - 5. A method as in claim 4 further including the steps of:
    - providing a Hidden Markov Model-based handwriting recognizer wherein one Hidden Markov Model models each of said handwriting prototypes; and
      
      creating a network of said Hidden Markov Model for said particular character.
  - 6. A method as in claim 3 further including the step of:
    - creating an upper triangular matrix to store each of said distance values calculated for said each pair of said first plurality of trajectories.
  - 7. A method as in claim 6 further including the steps of:
    - performing segmentation on each of said character strings to partition said strings into a plurality of segments;
      
      extracting a plurality of values from said segments corresponding to static properties in said segments; and
      
      forming said feature vectors from said plurality of values.

8. A method for processing digitized stroke-based handwriting data of known character strings, said handwriting data having (N) occurrences of a particular character, wherein each of said (N) occurrences is represented by a set of mathematical feature vectors, said method comprising the steps of:
- creating a trajectory from each one of said sets of feature vectors;
  
  calculating a distance value between each pair of trajectories using dynamic time warping to provide a plurality of distance values, wherein said dynamic time warping step further including the steps of;
  
  determining a separation distance d_i,j between a pair of said first plurality of trajectories, T_i and T_j where T_i ={P₁ⁱ, P₂ⁱ, . . . P_nⁱ } and includes n of said feature vectors, T_j ={P₁^j, P₂^j, . . . P_m^j } and includes m of said feature vectors;
  
  byforming a distance matrix D_i,j where a (k,l) entry D_i,j (k,l) of said distance matrix D_i,j is equal to a distance between P_kⁱ, a kth one of said feature vectors of said trajectory T_i, and P₁^j, an lth one of said feature vectors of said trajectory T_j ;
  
  determining an entry-to-entry path in said distance matrix D_i,j from D_i,j (1,1) to D_i,j (n,m) such that a sum of entries along said entry-to-entry path is a minimum, and setting said sum equal to said separation distance d_i,j ;
  
  grouping a first plurality of said trajectories into a first cluster;
  
  grouping a second plurality of said trajectories into a second cluster;
  
  for a first one of said trajectories, calculating a first average distance value between said first one of said trajectories and said first cluster by averaging each of said plurality of distance values between said first trajectory and each of said first plurality of said trajectories;
  
  calculating a second average distance value between said first one of said trajectories and said second cluster by averaging each of said plurality of distance values between said first trajectory and each of said second plurality of said trajectories;
  
  assigning said first one of said trajectories to said first cluster if said first average distance value is less than said second average distance value, and assigning said first one of said trajectories to said second cluster if said second average distance value is less than said first average distance value; and
  
  defining a first and second prototype corresponding to said first and second clusters to represent similar occurrences of said particular character.

9. A stroke-based handwriting data processing system, wherein said stroke-based handwriting data contains a known character string, said system comprising:
- signal processing means for generating segments from said stroke-based handwriting data from a plurality of samples, said signal processing means including means for characterizing each of said segments from each of said samples as a feature vector;
  
  alignment means for labelling each of said feature vectors as corresponding to a particular character in said character string;
  
  trajectory formation means for forming a trajectory for each of said feature vectors labelled as corresponding to a particular character in said character string;
  
  scoring means for calculating a similarity score between each pair of said trajectories, wherein said scoring means includes;
  
  dynamic time warping means for determining a separation distance d_i,j between a pair of said first plurality of trajectories T_i and T_j where T_i ={P₁ⁱ, P₂ⁱ, . . . P_nⁱ } and includes n of said feature vectors, and T_j ={P₁^j, P₂^j, . . . P_m^j } and includes m of said feature vectors;
  
  said dynamic time warping means including;
  
  matrix means for forming a distance matrix D_i,j where a (k,l) entry D_i,j (k,l) of said distance matrix D_i,j is equal to a distance between P_kⁱ, a kth one of said feature vectors of said trajectory T_i, and P₁^j, an lth one of said feature vectors of said trajectory T_j ;
  
  calculation means for calculating an entry-to-entry path in said distance matrix D_i,j from D_i,j (1,1) to D_i,j (n,m) such that a sum of entries along said entry-to-entry path is a minimum, and setting said separation distance d_i,j equal to said sum; and
  
  clustering means for grouping said trajectories into a plurality of clusters according to said similarity scores.
- View Dependent Claims (10, 11)
- - 10. A system as in claim 9 wherein said clustering means includes means for successively merging said plurality of clusters to form larger ones of said clusters.
  - 11. A system as in claim 10 further including means for defining a handwriting prototype from each of said clusters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Computer Incorporated (Apple Inc.)
Inventors
Grajski, Kamil A., Chow, Yen-Lu
Primary Examiner(s)
SHALWALA, BIPIN H

Application Number

US08/300,426
Time in Patent Office

2,055 Days
Field of Search

382/187, 382/253, 382/179, 382/180, 382/188, 382/190, 382/215, 382/224, 382/225, 382/191, 382/197, 382/198, 348/414, 348/417, 348/418, 395/2.54, 395/2.65, 395/2.62, 395/2
US Class Current

382/187
CPC Class Codes

G06V 30/1423 the instrument generating s...

G06V 30/196 using sequential comparison...

Automatic method for scoring and clustering prototypes of handwritten stroke-based data

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic method for scoring and clustering prototypes of handwritten stroke-based data

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links