Text line bounding system
First Claim
1. In an optical scanning system having an optical scanning head for sequentially scanning horizontally across a document to create pixel oriented data representing text of the document for subsequent comparison searching to determine the alpha-numeric contents of the document, the improvement for automatically compensating for text line skew comprising:
- (a) means for dividing the data into equal swaths representing vertical columns on the documents;
(b) means for scanning the data by swath to determine the apparent text line initiations and termination contained therein and for creating a data matrix of initiation/termination pairs by swath;
said data matrix having at least left and right boundaries;
(c) means for sequentially searching the data in the data matrix left and right for each pair in a base vector swath, representative of a predetermined range of apparent text line initiations and terminations, to trace the data indicative of apparent text lines to determine a plurality of points apparently on base lines of respective ones of the apparent text lines of the document;
(d) means for determining the apparent skew angle of the text from the points on the apparent base lines; and
;
(e) means for bounding the data of the apparent text lines for subsequent comparison searching from the found intersection/termination pairs and apparent skew angle determined.
1 Assignment
0 Petitions
Accused Products
Abstract
A text line bounding system for non-mechanically adjusting for skewed text in scanned text. The data from scanned page texts are divided into vertical swaths and all pixels at respective vertical levels within each swath are set black if any one pixel within the swath at the level is black. Intersections and terminations are found and the datum for each line is established after removing extraneous data. The skew angle of the text is then established, following which the text lines are statistically bounded. The actual text data is then rotated according to the orientation established for conventional processing.
-
Citations
12 Claims
-
1. In an optical scanning system having an optical scanning head for sequentially scanning horizontally across a document to create pixel oriented data representing text of the document for subsequent comparison searching to determine the alpha-numeric contents of the document, the improvement for automatically compensating for text line skew comprising:
-
(a) means for dividing the data into equal swaths representing vertical columns on the documents; (b) means for scanning the data by swath to determine the apparent text line initiations and termination contained therein and for creating a data matrix of initiation/termination pairs by swath;
said data matrix having at least left and right boundaries;(c) means for sequentially searching the data in the data matrix left and right for each pair in a base vector swath, representative of a predetermined range of apparent text line initiations and terminations, to trace the data indicative of apparent text lines to determine a plurality of points apparently on base lines of respective ones of the apparent text lines of the document; (d) means for determining the apparent skew angle of the text from the points on the apparent base lines; and
;(e) means for bounding the data of the apparent text lines for subsequent comparison searching from the found intersection/termination pairs and apparent skew angle determined. - View Dependent Claims (2, 3, 4)
-
-
5. In an optical scanning system having an optical scanning head for sequentially scanning horizontally across a document to create pixel oriented data representing the text of the document which is subsequently comparison searched to determine the alpha-numeric contents of the document, the method for automatically compensating for text line skew comprising:
-
(a) dividing the data into equal swaths representing vertical columns on the document said swaths also containing vertically addressable rows; (b) scanning the data by swath to determine the apparent text line initiations and terminations contained therein and creating a data matrix of initiation/termination pairs by swath;
said data matrix having at least left and right boundaries;(c) sequentially searching the data in the matrix left and right for each pair in a base vector swath, representative of a predetermined range of apparent text line initiations and terminations, to trace the data indicative of apparent text lines to determine a plurality of points apparently on base lines of respective ones of the apparent text lines of the document; (d) determining the apparent skew angle of the text from the points on the apparent base lines; (e) bounding the data of the apparent text lines from the found intersection/termination pairs and apparent skew angle determined; and (f) using the bounded apparent text line data for the comparison searching. - View Dependent Claims (6)
-
-
7. An optical scanning system capable of automatically compensating for document text line skewing comprising:
-
(a) scanning head means for sequentially scanning across the document to develop a data matrix of pixel oriented digital data bits; (b) logic means for searching said data bits by document column related swaths to determine a plurality of points apparently lying on the document text lines, for determining said text lines'"'"' apparent skew angle from said points, and for determining the left, right, top, and bottom boundaries in said data matrix for respective ones of said text lines from said points and said skew angle, said swaths also containing vertically addressable rows and (c) means for picking up the data from said data matrix within said boundaries for use in comparing said data bits to known bit configurations and determining the alpha-numeric contents of the document text; said logic means including means for setting all said pixel oriented digital data bits for each addressable vertical row within each swath to indicate black if any one of said bits initially indicates black. - View Dependent Claims (8, 9)
-
-
10. An optical scanning system capable of automatically compensating for document text line skewing comprising:
-
scanning head means for sequentially scanning across the document to develop a data matrix of pixel oriented digital data bits; logic means for searching said data bits by document column related swaths to determine a plurality of points apparently lying on the document text lines, for determining said text lines'"'"' apparent skew angle from said points, and for determining the left, right, top, and bottom boundaries in said data matrix for respective ones of said text lines from said points and said skew angle, said swaths also containing vertically addressable rows; and means for picking up the data from said data matrix within said boundaries for use in comparing said data bits to known bit configurations and determining the alpha-numeric contents of the document text; said logic means including storage means and counter means and further including logic to accomplish the steps of; (a) dividing the data into swaths; (b) filling in partial swath strips such that all pixels for each addressable vertical row within each swath are treated as black if any one of the bits is initially black; (c) initializing the storage means and counter means; (d) starting at swath #1; (e) starting at the bottom of the swath; (f) setting the condition "looking for intersection"; (g) checking to see if an edge has been found and if it has, going to step (l); (h) moving up the data in the swath; (i) checking to see if the end of the swath has been reached and if it has not, going back to step (g); (j) checking to see if the last swath has been done and if it has, terminating this portion of the logic; (k) proceeding to the next swath and returning to step (e); (l) checking to see if the edge is an intersection and if not, going to step (p); (m) checking to see if the "looking for intersection" condition is set and if not, going back to step (h); (n) saving the vertical address of the intersection by swath number; (o) setting the "looking for termination" condition and going back to step (h); (p) checking to see if the "looking for termination" condition is set and if not returning to step (h); (q) checking to see if the termination edge found is within the allowed distance limitations of the matching intersection and if it is, proceeding to step (s); (r) discarding both the intersection and termination addresses and proceeding back to step (h); (s) bumping the pair count by one for this swath; (t) saving the vertical address of the termination by swath number; and
,(u) setting the "looking for intersection" condition and proceeding back to step (h). - View Dependent Claims (11, 12)
-
Specification