Recognition of characters in cursive script

US 5,335,289 A
Filed: 02/14/1992
Issued: 08/02/1994
Est. Priority Date: 02/13/1991
Status: Expired due to Fees

First Claim

Patent Images

1. A method of recognizing characters in cursive script in which the script is scanned to detect word boundaries and words are then segmented into characters, characterized by the steps of:

(i) choosing and extracting (30) a word boundary from a cursive script comprised of characters;

(ii) starting at said word boundary, extracting (50) a portion of said word;

(iii) comparing (60) said extracted portion with a set of reference portions representing known characters, each of said known characters having an average width;

(iv) extracting a second portion, said second portion being successive to said first portion, and comparing of said second portion with said set of reference portions;

(v) repeating said extracting step (iv) and said comparing step (iii) with successive portions until said successive portions have been identified as one of said known characters, and(vi) skipping a number of portions depending on said average width of said identified known character;

(vii) starting from the last skipped portion, extracting a portion of said word and then repeating the process from step (ii) for the identification of next and subsequent characters.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for recognizing characters in cursive script is provided in which the script is scanned to detect word boundaries and words are then segmented into characters. This is accomplished by segmenting the script to form an initial portion, the segmentation being performed with reference to its position relative to a word boundary. This initial portion is then compared with a set of reference portions. Subsequent portions of the script are taken in sequence and compared with reference portions until a character is identified with an uncertainty less than a predetermined threshold value. A new initial portion is then segmented, with the new initial portion chosen on the basis of the average width of the character identified and the comparison process repeated to identify the next character.

32 Citations

View as Search Results

11 Claims

1. A method of recognizing characters in cursive script in which the script is scanned to detect word boundaries and words are then segmented into characters, characterized by the steps of:
- (i) choosing and extracting (30) a word boundary from a cursive script comprised of characters;
  
  (ii) starting at said word boundary, extracting (50) a portion of said word;
  
  (iii) comparing (60) said extracted portion with a set of reference portions representing known characters, each of said known characters having an average width;
  
  (iv) extracting a second portion, said second portion being successive to said first portion, and comparing of said second portion with said set of reference portions;
  
  (v) repeating said extracting step (iv) and said comparing step (iii) with successive portions until said successive portions have been identified as one of said known characters, and(vi) skipping a number of portions depending on said average width of said identified known character;
  
  (vii) starting from the last skipped portion, extracting a portion of said word and then repeating the process from step (ii) for the identification of next and subsequent characters.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A method as claimed in claim 1, characterized by the steps of:
    - (a) forming (50) an initial vector (57) representing features of a single-dimensional cross-section of said script at a first position in the characters constituting the script, said initial vector chosen with reference to a word boundary;
      
      (b) comparing (60) said initial vector with a known vector from a set of reference vectors;
      
      (b_i) determining an accumulated uncertainty value relating to the degree with which the compared vector identifies a known character, and(b_ii) comparing said accumulated uncertainty value with a predetermined threshold value;
      
      (c) if said accumulated uncertainty value is less than said threshold value then recognizing said vector as said known character, otherwise selecting subsequent vectors for comparison with said set of reference vectors, and repeating steps (b_i) to (b_ii) until said uncertainty value is less than said threshold value, and then recognizing said initial and said subsequent vectors as said known character;
      
      (d) selecting a new initial vector in accordance with the character so identified, comparing said vector with said set of reference vectors, and repeating the process from step (b_i) for the identification of the next and subsequent characters.
  - 3. A method as claimed in claim 2, in which steps (iii) through (vi) are repeated until another word boundary is reached, at which time the method is repeated from step (i) for a different word boundary.
  - 4. A method as claimed in any claim 2, in which said cursive script is typewritten cursive script using characters selected from a predetermined font.
  - 5. A method as claimed in claim 4, in which said set of reference portions is formed during a learning phase by the steps of:
    - entering (70) into a storage device (90) said set of reference portions representative of possible portions relating to characters in said font, and assigning each such portion a particular label, andscanning (50) each character belonging to said font in turn to produce a sequence of portions and hence a sequence of labels identified with said character, and storing said sequences in said storage device (90).

6. A system for recognizing characters in cursive script in which the script is scanned to detect word boundaries (40) and words are then segmented into characters, characterized by:
- a sectioning means (50) for forming a series of portions representing features of cursive script at different positions in characters constituting the script; and
  
  recognition and segmentation means (60) for comparing an initial portion, chosen with reference to a word boundary, with a known portion from a set of reference portions and to compare subsequent portions in said series similarly with known portions, until the cumulative results of the comparison identify a character with an uncertainty less than a predetermined threshold value, said character having an average width,skipping means for skipping a number of portions determined by said average width of said identified character and determining the positions of said cursive script at which said sectioning means (50) and said recognition and segmentation means (60) shall be applied.
- View Dependent Claims (7, 8, 9, 10)
- - 7. A system as claimed in claim 6, wherein said recognition and segmentation means is adapted to repeat the comparison process until another word boundary is reached and then to select a new initial portion and to repeat the comparison process for said new and subsequent portions.
  - 8. A system as claimed in claim 7, wherein said sectioning means forms a series of vectors representing features of the script at different positions in the characters constituting the script.
  - 9. A system as claimed in claim 8, wherein said recognition and segmentation means (60) includes a comparator adapted to:
    - compare an initial vector, chosen with reference to a word boundary, with a known vector from a set of reference vectors;
      
      determine an accumulated uncertainty value relating to the degree with which the compared vector identifies a known character;
      
      compare said accumulated uncertainty value with a predetermined threshold value;
      
      if said accumulated uncertainty value is less than said threshold value then to recognize the character as said known character, otherwise to select subsequent vectors for comparison with said set of reference vectors, and to repeat the comparison process until said uncertainty value is less than said threshold value, andselect a new initial vector in accordance with the character so identified, to compare said vector with said set of reference vectors, and to repeat the comparison process for the identification of the next and subsequent characters.
  - 10. A system as claimed in claim 9, adapted to allow said set of reference portions to be formed during a learning phase by the steps of:
    - entering (70) into a storage (90) device said set of reference portions representative of possible portions relating to characters in said font, and assigning each such portion a particular label, andscanning each character belonging to said font in turn to produce a sequence of portions and hence a sequence of labels identified with said character, and storing said sequences in said storage device.

11. Apparatus for recognizing characters in cursive script comprising:
- (a) means for image processing (40) to detect word boundaries and isolating patterns relating to words in the cursive script,(b) means for feature extraction (50) for receiving the output of means (40) for image processing and forming a series of vectors (57) representing features of the characters at different positions in the cursive script,(c) means for performing unsupervised learning (70) to generate a set of reference vectors representative of possible vectors relating to characters in the cursive script and storing them in a code book (90),(d) means for supervised learning (80) to compute statistics necessary for identification of characters in the cursive script, including (i) means for computing the a-priori conditional probability (100) of a vector relative to the nearest reference vector stored in the code book, and (ii) means for computing a-posteriori probability (110) of characters in the cursive script,(e) electron means for directing the output of said means for feature extraction (50) to the unsupervised learning means (70) or the supervised learning means (80),(f) means for performing recognition/segmentation (60) to compare known characters with referenced portions of characters in the cursive script until a character has been identified as one of the known characters, each of said known characters having an average width,(g) means for supplying input to said means for performing recognition/segmentation (60) from the code book (90);
  
  said means for computing a-posteriori probability (110) and said means for feature extraction (50),(h) output means connected to said means for performing recognition/segmentation (60) for providing recognized text or providing unrecognized text as an input to said means for feature extraction, and(i) skipping means for providing a number of portions to be skipped as an input to said means for feature extraction (50), said number based on input from said means for performing recognition/segmentation (110), said input based on said average width of said identified known character.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Abdelazim, Hazem Y.
Primary Examiner(s)
Boudreau, Leo H.
Assistant Examiner(s)
Prikockis, Larry J.

Application Number

US07/837,450
Time in Patent Office

900 Days
Field of Search

382/9, 382/13, 382/36, 382/39, 382/40, 382/22, 382/37, 382/30, 382/34, 382/14, 382/20
US Class Current

382/177
CPC Class Codes

G06V 30/196 using sequential comparison...

G06V 30/293 of characters other than Ka...

Recognition of characters in cursive script

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

32 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Recognition of characters in cursive script

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

32 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links