Method of indexing words in handwritten document images using image hash tables
First Claim
1. A method of localizing handwritten words in an image of a document, comprising:
- a) pre-processing a document containing handwriting where curves with features for word localization are extracted from handwritten words contained in the document, wherein each curve includes selected features with the selected features being less than all of the features for the curve;
b) grouping the curves to form curve groups;
c) computing basis triples from the selected features of the curves of the curve groups;
d) computing affine coordinates for the features of the curves in each curve group with respect to corresponding basis triples computed with said (c);
e) using the affine coordinates and basis triples to create an image hash table with entries for indexing the handwritten words of the document, wherein the image hash table includes a list of the basis triples;
f) indexing the handwritten words with the image hash table, wherein query word features are used to look up the handwritten words in the image hash table for candidate locations; and
g) verifying registration of underlying words within the image hash table by projecting a query word onto one of the underlying words at the candidate locations to find a match.
8 Assignments
0 Petitions
Accused Products
Abstract
A method of locating handwritten words in handwritten text images under a variety of transformations including changes in document orientation, skew, noise, and changes in handwriting style of a single author which avoids a detailed search of the image for locating every word by pre-computing relevant information in a hash table and indexing the table for word localization. Both the hash table construction and indexing can be done as fast operations taking time quadratic in the number of basis points. Generally, the method involves four stages: (1) Pre-processing where features for word localization are extracted; (2) Image hash table construction; (3) Indexing where query word features are used to look up hash table for candidate locations; and (4) Verification, where the query word is projected and registered with the underlying word at the candidate locations. The method has applications in digital libraries, handwriting tokenization, document management and OCR systems.
-
Citations
9 Claims
-
1. A method of localizing handwritten words in an image of a document, comprising:
-
a) pre-processing a document containing handwriting where curves with features for word localization are extracted from handwritten words contained in the document, wherein each curve includes selected features with the selected features being less than all of the features for the curve; b) grouping the curves to form curve groups; c) computing basis triples from the selected features of the curves of the curve groups; d) computing affine coordinates for the features of the curves in each curve group with respect to corresponding basis triples computed with said (c); e) using the affine coordinates and basis triples to create an image hash table with entries for indexing the handwritten words of the document, wherein the image hash table includes a list of the basis triples; f) indexing the handwritten words with the image hash table, wherein query word features are used to look up the handwritten words in the image hash table for candidate locations; and g) verifying registration of underlying words within the image hash table by projecting a query word onto one of the underlying words at the candidate locations to find a match. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
Specification