Method and system for document segmentation
First Claim
1. A method of document segmentation comprising:
- a) generating a plurality of projection profiles of pixel intensities for a document containing a plurality of text lines including a plurality of text lines of a first type and a plurality of text lines of a second type over a range of angles;
b) calculating a plurality of slope values of said plurality of projection profiles for a plurality of discrete distances perpendicular to said range of angles;
c) sorting out a set of maximum absolute slope values of said plurality of slope values; and
d) identifying text lines of said first and second type by setting a threshold slope value wherein absolute slope values greater than said threshold slope value indicate said plurality of text lines of said first type, and absolute slope values less than said threshold slope value indicate said plurality of text lines of said second type.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of document segmentation. Specifically, one embodiment of the present invention discloses a method of document segmentation that performs a plurality of projection profiles of pixel intensities on a document containing a plurality of text lines over a range of angles. A plurality of slope values for a plurality of discrete distances perpendicular to said range of angles is calculated for the plurality of projection profiles. A set of maximum absolute slope values is sorted out from the plurality of slope values. Text lines of first and second type are identified by setting a threshold slope value. Absolute slope values greater than the threshold slope value indicate the plurality of text lines of said first type. Absolute slope values less than the threshold slope value indicate the plurality of text lines of a second type.
-
Citations
29 Claims
-
1. A method of document segmentation comprising:
-
a) generating a plurality of projection profiles of pixel intensities for a document containing a plurality of text lines including a plurality of text lines of a first type and a plurality of text lines of a second type over a range of angles;
b) calculating a plurality of slope values of said plurality of projection profiles for a plurality of discrete distances perpendicular to said range of angles;
c) sorting out a set of maximum absolute slope values of said plurality of slope values; and
d) identifying text lines of said first and second type by setting a threshold slope value wherein absolute slope values greater than said threshold slope value indicate said plurality of text lines of said first type, and absolute slope values less than said threshold slope value indicate said plurality of text lines of said second type. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of document segmentation comprising:
-
a) generating a plurality of projection profiles of pixel intensities for a digitized document containing a plurality of text lines including a plurality of handwritten text lines and a plurality of machine printed text lines over a range of angles;
b) calculating a plurality of slope values of said plurality of projection profiles for a plurality of discrete distances perpendicular to said range of angles;
c) sorting out a set of maximum absolute slope values of said plurality of slope values, said set of maximum absolute slope values associated with a plurality of local skew angles for said plurality of handwritten text lines and said plurality of machine printed text lines; and
d) identifying said plurality of machine printed text lines and said plurality of handwritten text lines by setting a threshold absolute slope value, wherein a first maximum absolute slope value greater than said threshold slope value indicates one of said plurality of machine printed text lines and a second maximum absolute slope value less than said threshold absolute slope value indicates one of said plurality of handwritten text lines. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer system comprising:
-
a processor; and
a computer readable memory coupled to said processor and containing program instructions that, when executed, implement a method of document segmentation comprising;
a) generating a plurality of projection profiles of pixel intensities for a document containing a plurality of text lines including a plurality of text lines of a first type and a plurality of text lines of a second type over a range of angles;
b) calculating a plurality of slope values of said plurality of projection profiles for a plurality of discrete distances perpendicular to said range of angles;
c) sorting out a set of maximum absolute slope values of said plurality of slope values; and
d) identifying text lines of said first and second type by setting a threshold slope value wherein absolute slope values greater than said threshold slope value indicate said plurality of text lines of said first type, and absolute slope values less than said threshold slope value indicate said plurality of text lines of said second type. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29)
-
Specification