Handwriting signal processing front-end for handwriting recognizers

US 5,577,135 A
Filed: 03/01/1994
Issued: 11/19/1996
Est. Priority Date: 03/01/1994
Status: Expired due to Term

First Claim

Patent Images

1. A front-end processing method for a handwriting recognition system, said method for processing strokes of handwriting training samples comprising a time series of (x,y) coordinates, said method comprising the steps of:

segmenting said strokes based on interrelationships of said (x,y) coordinates into an ordered set of training stroke segments that are non-uniform in length for each of said handwriting training samples;

extracting a first plurality of feature values from each of said training stroke segments, wherein each of said feature values extracted therefrom forms entries of a word-independent training feature vector;

creating a series of feature-specific vectors by grouping said entries corresponding to one of said feature values from contiguous groups of said word-independent training feature vectors;

performing multiple vector quantization by vector quantizing each of said feature-specific vectors to statistically characterize said feature-specific vectors, wherein said vector quantizing includes;

partitioning said feature-specific vectors into a plurality of clusters, wherein each of said clusters includes a mean value and a distribution about said mean value for proximate ones of said feature-specific vectors, andlabelling each of said mean values in each of said clusters with a symbol; and

storing in a plurality of codebooks said mean values and said symbols for each of said clusters to effect a reduced representation of said handwriting training samples.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A handwriting signal processing front-end method and apparatus for a handwriting training and recognition system which includes non-uniform segmentation and feature extraction in combination with multiple vector quantization. In a training phase, digitized handwriting samples are partitioned into segments of unequal length. Features are extracted from the segments and are grouped to form feature vectors for each segment. Groups of adjacent from feature vectors are then combined to form input frames. Feature-specific vectors are formed by grouping features of the same type from each of the feature vectors within a frame. Multiple vector quantization is then performed on each feature-specific vector to statistically model the distributions of the vectors for each feature by identifying clusters of the vectors and determining the mean locations of the vectors in the clusters. Each mean location is represented by a codebook symbol and this information is stored in a codebook for each feature. These codebooks are then used to train a recognition system. In the testing phase, where the recognition system is to identify handwriting, digitized test handwriting is first processed as in the training phase to generate feature-specific vectors from input frames. Multiple vector quantization is then performed on each feature-specific vector to represent the feature-specific vector using the codebook symbols that were generated for that feature during training. The resulting series of codebook symbols effects a reduced representation of the sampled handwriting data and is used for subsequent handwriting recognition.

Citations

14 Claims

1. A front-end processing method for a handwriting recognition system, said method for processing strokes of handwriting training samples comprising a time series of (x,y) coordinates, said method comprising the steps of:
- segmenting said strokes based on interrelationships of said (x,y) coordinates into an ordered set of training stroke segments that are non-uniform in length for each of said handwriting training samples;
  
  extracting a first plurality of feature values from each of said training stroke segments, wherein each of said feature values extracted therefrom forms entries of a word-independent training feature vector;
  
  creating a series of feature-specific vectors by grouping said entries corresponding to one of said feature values from contiguous groups of said word-independent training feature vectors;
  
  performing multiple vector quantization by vector quantizing each of said feature-specific vectors to statistically characterize said feature-specific vectors, wherein said vector quantizing includes;
  
  partitioning said feature-specific vectors into a plurality of clusters, wherein each of said clusters includes a mean value and a distribution about said mean value for proximate ones of said feature-specific vectors, andlabelling each of said mean values in each of said clusters with a symbol; and
  
  storing in a plurality of codebooks said mean values and said symbols for each of said clusters to effect a reduced representation of said handwriting training samples.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. A method as in claim 1 wherein each of said segments includes a start point, a halfway point, and an endpoint, said segmenting step further including the step of defining said endpoint for each of said segments by calculating a first derivative and a second derivative from said y coordinate-time series data.
  - 3. A method as in claim 2 further including the step of reordering said segments based on said x-coordinate time series data.
  - 4. A method as in claim 3 wherein said plurality of features extracted from said segment include:
    - net x- and y-distance between said start point of said segment and said endpoint of said segment;
      
      net x- and y-distance between said start point of said segment and said halfway point of said segment;
      
      an estimate of speed of motion in an x-direction at said endpoint of said segment;
      
      number of segments that are generated when a multiple-pass zero-crossing based segmentation procedure is applied to the x-coordinate data in said segment; and
      
      coefficients of a third-order polynomial fitted separately to said x-coordinates and y-coordinates in said segment.
  - 5. A method as in claim 1 further including the step of forming an input frame for each of said training stroke segments by combining said contiguous groups of said training feature vectors, and wherein said entries corresponding to one of said feature values are extracted from said training feature vectors in said input frame to form said series of feature-specific vectors.
  - 6. A method as in claim 5 wherein said input frame is formed from three adjacent training feature vectors.

7. A front-end processing method for a handwriting recognition system, said method for processing strokes of handwriting test samples comprising a time series of (x,y) coordinates, wherein said handwriting recognition system has been trained from handwriting training samples including a plurality of feature-specific vectors and codebooks generated therefrom, wherein each of said codebooks is generated for each of said feature-specific vectors, said codebooks comprising mean values and symbols for a plurality of clusters, said method comprising the steps of:
- segmenting said strokes based on interrelationships of said (x,y) coordinates into an ordered set of test stroke segments that are non-uniform in length for each of said handwriting test samples;
  
  extracting a first plurality of feature values from each of said test stroke segments, wherein each of said feature values extracted therefrom forms entries of a word-independent test feature vector;
  
  creating a series of test feature-specific vectors by grouping said entries corresponding to one of said feature values from contiguous groups of said word-independent test feature vectors;
  
  performing multiple vector quantization by vector quantizing each of said test feature-specific vectors to compare each of said test feature-specific vectors to said mean values in said codebooks, wherein said vector quantizing includes assigning to each of said test feature-specific vectors, one of said symbols of one of said mean values to which each of said test feature-specific vectors is closest in distance to form a series of output symbols; and
  
  using said output symbols to represent said handwriting test samples for recognition of said handwriting test samples.
- View Dependent Claims (8, 9)
- - 8. A method as in claim 7 further including the step of forming an input frame for each of said training stroke segments by combining said contiguous groups of said training feature vectors, and wherein said entries corresponding to one of said feature values are extracted from said training feature vectors in said input frame to form said series of feature-specific vectors.
  - 9. A method as in claim 8 wherein said input frame is formed from three adjacent training feature vectors.

10. A front-end processing method for training a handwriting recognizer from a training set of sampled handwriting data comprising a time series of sample points in the form of (x,y) coordinates, said method including the steps of:
- performing non-uniform segmentation on said series of sample points to partition said sample points into a first segment;
  
  extracting a first feature value and a second feature value from said first segment;
  
  creating a first word-independent vector from said first and second feature values;
  
  creating a first feature-specific vector from said first feature value in said first word-independent vector;
  
  creating a second feature-specific vector from said second feature value in said first word-independent vector;
  
  performing multiple vector quantization on said first word-independent vector by vector quantizing said first feature-specific vector to form a first cluster, and by vector quantizing said second feature-specific vector to form a second cluster, wherein each of said first and second clusters include a respective mean value and a distribution about said mean value; and
  
  storing said first cluster in a first codebook and storing said second cluster in a second codebook.
- View Dependent Claims (11)
- - 11. A method as in claim 10 further including the steps of:
    - grouping said first feature value and said second feature value in said first word-independent vector to form a third feature-specific vector; and
      
      vector quantizing said third feature-specific vector to form a third cluster.

12. A front-end processing method for training a handwriting recognizer from a training set of sampled handwriting data comprising a series of sample points in the form of (x,y) coordinates, said method including the steps of:
- performing non-uniform segmentation on said series of sample points to partition said sample points into a first and a second segment;
  
  extracting a first and a second feature value from said first segment, and extracting said first and second feature values from said second segment;
  
  creating a first word-independent vector from said first and second feature values extracted from said first segment wherein said first and second feature values form entries in said first vector;
  
  creating a second word-independent vector from said first and second feature values extracted from said second segment wherein said first and second feature values form entries in said second vector;
  
  combining said first word-independent vector and said word-independent second vector to create an input frame;
  
  creating a first feature-specific vector by grouping said entries from said first word-independent vector and said second word-independent vector in said input frame corresponding to said first feature value;
  
  creating a second feature-specific vector by grouping said entries from said first word-independent vector and said second word-independent vector corresponding to said second feature value;
  
  performing multiple vector quantization on said input frame by vector quantizing said first feature-specific vector to statistically model said first feature-specific vector to create a first codebook, and by vector quantizing said second feature-specific vector to statistically model said second feature-specific vector to form a second codebook; and
  
  training said handwriting recognizer using said first and second codebooks.
- View Dependent Claims (13)
- - 13. A method as in claim 12 wherein said sampled handwriting data is written on a digitizing tablet using a pen device, said sampled handwriting data recorded as a series of (x,y) coordinates and a pen value indicating whether said pen is up or down, on said digitizing tablet, said non-uniform segmentation including the steps of:
    - partitioning said sample points into stroke data;
      
      removing redundant sample points;
      
      discarding said pen values of up;
      
      normalizing said sampled data;
      
      smoothing said Sampled data using a low-pass filter;
      
      computing the first derivative of said series of (x,y) coordinates to create an x-velocity series and a y-velocity series of said stroke data;
      
      computing the second derivative of said x-velocity series and a y-velocity series to determine acceleration for said stroke data;
      
      partitioning said stroke data into a plurality of segments using said y-velocity series, each one of said segments having an x-coordinate in said stroke data; and
      
      reordering said segments based on said x-coordinates to create a series of ordered segments.

14. A front-end processing method for a handwriting recognizer that has been trained using training data represented as a plurality of codebooks to recognize sampled handwriting data in the form of a series of sample points as known strings of characters, each of said codebooks including a plurality of clusters identified by symbols, said method including the steps of:
- performing non-uniform segmentation on said series of sample points to partition said sample points into a first and a second segment;
  
  extracting a first and a second feature value from said first segment, and extracting said first and second feature values from said second segment;
  
  creating a first word-independent vector from said first and second feature values extracted from said first segment wherein said first and second feature values form entries in said first word-independent vector;
  
  creating a second word-independent vector from said first and second feature values extracted from said second segment wherein said first and second feature values form entries in said second word-independent vector;
  
  combining said first word-independent vector and said second word-independent vector to create an input frame;
  
  creating a first feature-specific vector by grouping said entries from said first word-independent vector and said second word-independent vector in said input frame corresponding to said first feature value;
  
  creating a second feature-specific vector by grouping said entries from said first word-independent vector and said second word-independent vector corresponding to said second feature value;
  
  performing multiple vector quantization on said input frame by vector quantizating said first feature-specific vector to assign to said first feature-specific vector a first symbol of one of said clusters in one of said codebooks for which said first feature-specific vector is the closest, and by vector quantizating said second feature-specific vector to assign to said second feature-specific vector a second symbol of one of said clusters in one of said codebooks for which said second feature-specific vector is the closest; and
  
  sending said first and second symbols to said handwriting recognizer for recognition as said known strings of characters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Computer Incorporated (Apple Inc.)
Inventors
Lee, Kai-Fu, Grajski, Kamil A., Chow, Yen-Lu
Primary Examiner(s)
Couso, Jose L.
Assistant Examiner(s)
Bella, Matthew C.

Application Number

US08/204,031
Time in Patent Office

994 Days
Field of Search

382/13, 382/21, 382/56, 382/253, 382/179, 382/187, 348/417, 348/414, 348/418, 395/2.54, 395/2.65, 395/2.62, 395/2
US Class Current

382/253
CPC Class Codes

G06F 18/23 Clustering techniques

G06V 30/36 Matching; Classification

Handwriting signal processing front-end for handwriting recognizers

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Handwriting signal processing front-end for handwriting recognizers

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links