Relaxation word recognizer

US 5,617,488 A
Filed: 02/01/1995
Issued: 04/01/1997
Est. Priority Date: 02/01/1995
Status: Expired due to Fees

First Claim

Patent Images

1. A word recognition method comprising:

(a) dividing a document into a two dimensional matrix of pixel areas;

(b) scanning the pixel areas with a light source and detecting the intensity of light from each pixel area to generate a plurality of pixel data signals, each pixel data signal being proportional to the intensity of light detected from its corresponding pixel area;

(c) comparing the pixel data signals to data signals stored in a memory that represent alphanumeric characters and words in order to initially recognize word candidates from said pixel data signals;

(d) generating a ranking of candidates for each word based upon a likelihood of correctness of each candidate, each ranking having a top choice indicating the initial most likely correct candidate for the word;

(e) grouping ranked word candidates into neighborhoods, each neighborhood comprising two or more word candidates and each neighborhood being adjacent to at least one other neighborhood;

(f) reading from a read only memory word collocation data for each word candidate and one or more top choice adjacent word candidates;

(g) computing a collocation score for each word candidate and the top choice of at least two adjacent neighborhoods wherein the collocation score for each word candidate is computed by adding the probability of correctness of the word candidate to the products of (i) the collocation data of the candidate and its top choice neighbors and (ii) the probability of correctness for the top choice neighbor and dividing the result by the sums of the probabilities of correctness of all the words in the neighborhood and the sums of the products of (i) collocation data of each neighbor with the top choice of the adjacent neighbors and the (ii) the probability of correctness of the top choices of the adjacent neighbors;

(h) sorting word candidates by their collocation score;

(i) comparing changes between the word candidates of the initial neighborhoods and the word candidates in the sorted neighborhood;

(j) counting the number of changes for the neighborhoods; and

(k) repeating steps (e) through (i) until the number of changes is less than a predetermined threshold value.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A word recognizer system 10 has a probabilistic relaxation process that improves the performance of an image text recognition technique by propagating the influence of word collocation statistics. Word collocation refers to the likelihood that two words co-occur within a fixed distance of one another. The word recognizer 10 receives groups of visually similar decisions (called neighborhoods) for words in a running text. The position of decisions within the neighborhoods are modified based on how often they co-occur with decisions in the neighborhoods of other nearby words. This process is iterated a number of times effectively propagating the influence of the collocation statistics across an input text.

89 Citations

View as Search Results

8 Claims

1. A word recognition method comprising:
- (a) dividing a document into a two dimensional matrix of pixel areas;
  
  (b) scanning the pixel areas with a light source and detecting the intensity of light from each pixel area to generate a plurality of pixel data signals, each pixel data signal being proportional to the intensity of light detected from its corresponding pixel area;
  
  (c) comparing the pixel data signals to data signals stored in a memory that represent alphanumeric characters and words in order to initially recognize word candidates from said pixel data signals;
  
  (d) generating a ranking of candidates for each word based upon a likelihood of correctness of each candidate, each ranking having a top choice indicating the initial most likely correct candidate for the word;
  
  (e) grouping ranked word candidates into neighborhoods, each neighborhood comprising two or more word candidates and each neighborhood being adjacent to at least one other neighborhood;
  
  (f) reading from a read only memory word collocation data for each word candidate and one or more top choice adjacent word candidates;
  
  (g) computing a collocation score for each word candidate and the top choice of at least two adjacent neighborhoods wherein the collocation score for each word candidate is computed by adding the probability of correctness of the word candidate to the products of (i) the collocation data of the candidate and its top choice neighbors and (ii) the probability of correctness for the top choice neighbor and dividing the result by the sums of the probabilities of correctness of all the words in the neighborhood and the sums of the products of (i) collocation data of each neighbor with the top choice of the adjacent neighbors and the (ii) the probability of correctness of the top choices of the adjacent neighbors;
  
  (h) sorting word candidates by their collocation score;
  
  (i) comparing changes between the word candidates of the initial neighborhoods and the word candidates in the sorted neighborhood;
  
  (j) counting the number of changes for the neighborhoods; and
  
  (k) repeating steps (e) through (i) until the number of changes is less than a predetermined threshold value.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1 wherein the collocation score:
    - ##EQU4## is the probabilistic score of the word candidate ω
      
      _ij at time k+1 and n are the number of neighbors in a neighborhood.
  - 3. The method of claim 1 wherein the collocation score includes a plurality of products of collocations between the word candidates and the top choices of two or more neighborhoods.
  - 4. The method of claim 1 wherein the word collocation data comprises a mutual frequency of adjacent words in a corpus of words.

5. A word recognition apparatus comprising:
- a document scanner for scanning a document a document with a light source by dividing the scanned document into a two dimensional matrix of pixel areas and scanning each pixel areas with a light source to generate a plurality of pixel data signals, each pixel data signal being proportional to the intensity of light at its corresponding pixel area;
  
  a computer having a central processing unit including an arithmetic logic unit, a random access memory for storing input data and calculated data, and a read only memory for storing operating and application programs, and further memory for storing data corresponding to pixel data signals of alphanumeric characters and words in order to recognize word candidates from input pixel data signals and a word collocation database, said computer coupled to the scanner for receiving the pixel data signals;
  
  said computer comparing the light intensity pixel data signals to data signals stored in said memory holding alphanumeric characters and words in order to initially recognize word candidates from said pixel data signals;
  
  said computer generating a ranking of candidates for each word in proportion to the correspondence between the pixel data signals for a word candidate and the stored word data, each ranking having a top choice indicating the initial most likely correct candidate for the word;
  
  said computer grouping word candidates into neighborhoods, each neighborhood comprising two or more word candidates and each neighborhood being adjacent at least one other neighborhood;
  
  said computer reading from the word collocation database word collocation data for each word candidate;
  
  said arithmetic logic unit computing a collocation score for each word candidate and one or more top choices of two adjacent word candidates wherein the collocation score for each word candidate is computed by adding the probability of correctness of the word candidate to the products of (i) the collocation data of the candidate and its top choice neighbors and (ii) the probability of correctness for the top choice neighbor and dividing the result by the sums of the probabilities of correctness of all the words in the neighborhood and the sums of the products of (i) collocation data of each neighbor with the top choice of the adjacent neighbors and the (ii) the probability of correctness of the top choices of the adjacent neighbors;
  
  said arithmetic logic unit computing and storing a collocation score;
  
  said arithmetic logic unit sorting word candidates by their collocation score;
  
  said arithmetic logic unit comparing changes between the word candidates of the initial neighborhoods and the word candidates in the sorted neighborhood;
  
  said arithmetic logic unit counting the number of changes for the neighborhoods; and
  
  said computer outputting a list of likeliest word candidates when the number of changes is less than a predetermined threshold value.
- View Dependent Claims (6, 7, 8)
- - 6. The apparatus of claim 5 wherein the word collocation data comprises a mutual frequency of adjacent words in a corpus of words.
  - 7. The apparatus of claim 5 wherein the collocation score for each word candidate is computed by the following expression:
    - ##EQU5## is the probabilistic score of the word candidate ω
      
      _ij at time k+1 and n are the number of neighbors in a neighborhood.
  - 8. The apparatus of claim 5 wherein the collocation score includes a plurality of products of collocations between the word candidates and the top choices of two or more neighborhoods.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The Research Foundation for The State University of New York (State University of New York)
Original Assignee
The Research Foundation for The State University of New York (State University of New York)
Inventors
Hull, Jonathan J., Hong, Tao
Primary Examiner(s)
Boudreau, Leo
Assistant Examiner(s)
Davis, Monica S.

Application Number

US08/381,767
Time in Patent Office

790 Days
Field of Search

382/229, 382/230, 382/209, 364/419, 364/900
US Class Current

382/229
CPC Class Codes

G06V 30/10 Character recognition

G06V 30/274 Syntactic or semantic conte...

Relaxation word recognizer

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

89 Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Relaxation word recognizer

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

89 Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links