Character-recognition systems and methods with means to measure endpoint features in character bit-maps

US 5,359,671 A
Filed: 03/31/1992
Issued: 10/25/1994
Est. Priority Date: 03/31/1992
Status: Expired due to Term

First Claim

Patent Images

1. A character recognition system that identifies an input character as being a unique member of a defined character set, said system comprising:

a bit-map means for generating a character bit-map of an input character;

a character recognition means for processing said character bit-map and generating a set of (M) finite confidence measures one for each of (M) members of said character set, said confidence measures representing the degree of confidence that said input character corresponds to each of said (M) members of said character set;

a decision means for deciding if the confidence measure with the highest degree of confidence is an acceptable confidence measure;

a first output means for reporting as an output character the member of said character set with said acceptable confidence measure;

an augment means for identifying (N) of said (M) members with the (N) highest confidence measures, where (M) is greater than (N), and processing said bit-map if said decision means decides that there is no acceptable confidence measure, said augment means having a measuring means for measuring stroke endpoint locations and orientations of said bit-map and a second output means for reporting one of said (N) members as an output character based on said stroke endpoint locations and orientations; and

wherein said augment means further comprises a database means having a database of character strings, each said character string including a subset of said character set and wherein the members of each said subset represent different characters in said character set which have common stroke endpoint locations and orientations.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Character recognition method and system that identifies an input character as being a unique member of a defined character set. Specifically, a character bit-map of an input character is first generated. Thereafter, a character recognition procedure processes the character bit-map to generate a set of confidence measures one for each of the members of the character set. The confidence measures represent the degree of confidence that the input character corresponds to the members of the character set. A test is then made to determine if the confidence measure with the highest degree of confidence is acceptable. If there is an acceptable confidence measure, the member of the character set with the acceptable confidence measure is reported as the output character. If there is no acceptable confidence measure, a number of characters with the highest confidence measures are identified as candidates. Also, the character bit-map is analyzed further to obtain stroke-endpoint information which is then compared to a learned endpoint database having a number of character string-signature pairs. If there is a match between a database string and a candidate character, the match is used to report an output character. Endpoint location and orientation are obtained by modeling the bit-map as a charge distribution. A potential profile is constructed and thresholded and the results clustered into regions to obtain endpoint location information- The gradient of the potential profile is used to obtain endpoint orientation information.

Citations

22 Claims

1. A character recognition system that identifies an input character as being a unique member of a defined character set, said system comprising:
- a bit-map means for generating a character bit-map of an input character;
  
  a character recognition means for processing said character bit-map and generating a set of (M) finite confidence measures one for each of (M) members of said character set, said confidence measures representing the degree of confidence that said input character corresponds to each of said (M) members of said character set;
  
  a decision means for deciding if the confidence measure with the highest degree of confidence is an acceptable confidence measure;
  
  a first output means for reporting as an output character the member of said character set with said acceptable confidence measure;
  
  an augment means for identifying (N) of said (M) members with the (N) highest confidence measures, where (M) is greater than (N), and processing said bit-map if said decision means decides that there is no acceptable confidence measure, said augment means having a measuring means for measuring stroke endpoint locations and orientations of said bit-map and a second output means for reporting one of said (N) members as an output character based on said stroke endpoint locations and orientations; and
  
  wherein said augment means further comprises a database means having a database of character strings, each said character string including a subset of said character set and wherein the members of each said subset represent different characters in said character set which have common stroke endpoint locations and orientations.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The system of claim 1 wherein said database means comprises unique database signatures each associated with one of said character strings;
    - said measuring means comprising means for generating a search signature related to said location and orientation of said stroke endpoints; and
      
      said augment means comprising search means for searching, in response to said search signature, said database signatures to identify an output character string and for locating matches between said output string and said (N) members with the (N) highest confidence measures.
  - 3. The system of claim 2 wherein said second output means comprises means for reporting one of said (N) members as an output character according to the following rule:
    - if there is a match for only one of said (N) members and said output character string, report the matching member as the output character;
      
      if there is a match for more than one of said (N) members and said output character string, report the matching member with the highest confidence measure as the output character; and
      
      if there are no matches for said (N) members and said output character string, report the member with the highest confidence measure as the output character.
  - 4. The system of claim 3 wherein (N) is equal to two.
  - 5. The system of claim 3 wherein said measuring means further comprises charge model means for constructing a potential profile based on a charge distribution model of said bit-map and location means for thresholding said potential profile and clustering said thresholded profile to determine said locations of said stroke endpoints.
  - 6. The system of claim 5 wherein said measuring means further comprises gradient means for obtaining a gradient of said potential profile to determine said orientations of said stroke endpoints.

7. A stroke endpoint detector that identifies which of (N) candidate characters represents a unique member of a defined character set comprising:
- a bit-map means for reading an input bit-map;
  
  a database means having a database of character strings, each said character string comprises a subset of said character set and wherein the members of each said subset representing different characters in said character set which have common stroke endpoint features;
  
  measuring means for measuring the location and orientation of said stroke endpoints of said character bit-map, said measuring means including charge model means for constructing a potential profile based on a charge distribution model of said bit-map and location means for thresholding said potential profile and clustering said thresholded profile to determine the locations of regions of said stroke endpoints in said bit-map;
  
  search means, responsive to said measuring means, for searching said database to identify an output character string, and locate matches between said output string and said (N) candidate characters; and
  
  output means responsive to said search means for reporting a matching candidate character as an output character.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The detector of claim 7 wherein said database means comprises unique database signatures each associated with one of said character strings;
    - said measuring means comprises means for generating a search signature related to said location and orientation of said stroke endpoints; and
      
      said search means comprises means for searching, in response to said search signature, said database signatures to identify said output character string and for locating matches between said output string and said (N) candidate characters.
  - 9. The detector of claim 8 wherein said (N) candidate characters each have a weighting factor associated therewith;
    - and said output means comprising means for reporting one of said (N) candidate characters as an output character according to the following rule;
      
      if there is a match for only one of said (N) candidate characters and said output character string, report the matching candidate character as the output character;
      
      if there is a match for more than one of said (N) candidate characters and said output character string, report the matching candidate character with the highest weighting factor as the output character; and
      
      if there are no matches for said (N) candidate characters and said output character string, report the candidate character with the highest weighting factor as the output character.
  - 10. The detector of claim 9 wherein (N) is equal to two.
  - 11. The detector of claim 10 wherein said measuring means further comprises gradient means for obtaining a gradient of said potential profile to determine the orientations of said stroke endpoints.

12. A character recognition method for identifying an input character as being a unique member of a defined character set comprising the steps of:
- generating a character bit-map of an input character;
  
  processing said character bit-map with a character recognition procedure to generate a set of (M) finite confidence measures one for each of (M) members of said character set, said confidence measures representing the degree of confidence that said input character corresponds to each of said (M) members of said character set;
  
  determining if the confidence measure with the highest degree of confidence is an acceptable confidence measure;
  
  if there is an acceptable confidence measure, reporting as an output character the member of said character set with said acceptable confidence measure;
  
  if there is no acceptable confidence measure, identifying (N) of said (M) members with the (N) highest confidence measures, where (M) is greater than (N), and analyzing said character bit-map to measure stroke endpoint locations and orientations of said bit-map;
  
  reporting one of said (N) members as an output character based on the measure of said stroke endpoint locations and orientations; and
  
  further comprising the step of;
  
  constructing a database of character strings, each said character string comprising a subset of said character set and wherein the members of said subset represent different characters in said character set which have common stroke endpoint locations and orientations; and
  
  wherein said analyzing step comprises the step of searching said database.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The method of claim 12 further comprising the steps of:
    - constructing said database with unique signatures each associated with one of said character strings;
      
      generating a search signature in said analyzing step to determine the location and orientation of said stroke endpoints; and
      
      using said search signature in said searching step for searching said database signatures to identify an output character string, and locating matches between said output string and said (N) members with the (N) highest confidence measures.
  - 14. The method of claim 13 further comprising the step of reporting one of said (N) members as an output character according to the following rule:
    - if there is a match for only one of said (N) members and said output character string, report the matching member as the output character;
      
      if there is a match for more than one of said (N) members and said output character string, report the matching member with the highest confidence measure as the output character; and
      
      if there are no matches for said (N) members and said output character string, report the member with the highest confidence measure as the output character.
  - 15. The method of claim 14 wherein (N) is equal to two.
  - 16. The method of claim 14 wherein said analyzing step further comprises the steps of:
    - constructing a potential profile based on a charge distribution model of said bit-map;
      
      thresholding said potential profile to form a thresholded profile; and
      
      clustering said thresholded profile to determine the locations of regions of said stroke-endpoints in said bit-map.
  - 17. The method of claim 16 wherein said analyzing step further comprises the step of obtaining a gradient of said potential profile to determine the orientations of said stroke endpoints.

18. A stroke endpoint detection method that identifies which of (N) candidate characters represents a unique member of a defined character set comprising:
- reading an input bit-map;
  
  constructing a learned database of character strings, each said character string comprising a subset of said character set and wherein the members of each said subset representing different characters in said character set which have common stroke endpoint locations and orientations;
  
  measuring stroke-endpoint location and orientation of said character bit-map by using a charge model means for constructing a potential profile based on a charge distribution model of said bit-map and thresholding said potential profile and clustering said thresholded profile to determine the locations of regions of said stroke endpoints in said bit-map;
  
  searching said database to identify an output character string, and locate matches between said output string and said (N) candidate characters; and
  
  reporting one of said matches as an output character.
- View Dependent Claims (19, 20, 21, 22)
- - 19. The method of claim 18 further comprising the steps of:
    - constructing said learned database with unique signatures each associated with one of said character strings;
      
      generating a search signature in said analyzing step to determine the location and orientation of said stroke endpoints; and
      
      using said search signature in said searching step for searching said database signatures.
  - 20. The method of claim 19 wherein said (N) candidate characters each have a weighting factor associated therewith;
    - and wherein said reporting step comprises the step of reporting one of said (N) candidate characters as an output character according to the following rule;
      
      if there is a match for only one of said (N) candidate characters and said output character string, report the matching character as the output character;
      
      if there is a match for more than one of said (N) candidate characters and said output character string, report the matching member with the highest weighting factor as the output character; and
      
      if there are no matches for said (N) candidate characters and said output character string, report the candidate character with the highest weighting factor as the output character.
  - 21. The method of claim 20 wherein (N) is equal to two.
  - 22. The method of claim 21 wherein said measuring step further comprises the step of obtaining a gradient of said potential profile to determine the orientations of said stroke endpoints.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Eastman Kodak Company
Original Assignee
Eastman Kodak Company
Inventors
Rao, Arun
Primary Examiner(s)
Moore, David K.
Assistant Examiner(s)
CAMMARATA, MICHAEL

Application Number

US07/860,933
Time in Patent Office

938 Days
Field of Search

382/22, 382/14, 382/15, 382/16, 382/21, 382/23, 382/30, 382/37, 382/38, 382/39
US Class Current

382/225
CPC Class Codes

G06V 30/2504 Coarse or fine approaches, ...

Character-recognition systems and methods with means to measure endpoint features in character bit-maps

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Character-recognition systems and methods with means to measure endpoint features in character bit-maps

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links