Document segmentation system
First Claim
1. A method of processing a mixed document defining a plurality of pixels, the method comprising the steps of:
- (a) detecting a first set of the pixels corresponding to image pixels;
(b) detecting a second set of the pixels corresponding to large text pixels;
(c) computing, in a first color space, a first value corresponding to a white point of a media on which the document is printed;
(d) computing, in the first color space, a second value corresponding to a black point of the media;
(e) generating, via the first value, a table of values that are compensated for the white point of the media;
(f) labeling, via the table of values, each of the pixels in the document with one of;
(1) a color label;
(2) a black label; and
(3) a white label; and
(g) applying a plurality of syntactic rules to pixel sequences having predetermined labels and predetermined lengths.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for segmenting a document which has both text and image regions. The method and apparatus implement a technique in which large text pixels and image pixels are identified in a document having a relatively low resolution. The method and apparatus then detect dark text pixels on a light background region of a document and assign segmentation labels to each pixel. The pixel labels are post-processed using a plurality of syntactic rules to correct mislabeling of pixels. This process does not change the visual perception of the image regions in the document. Pixels identified as being in the background region of the document are assigned a white label and pixels identified as being in the text region are assigned a black label. The resulting processed document contains sharp black text and white background, resulting in improved perceptual quality and efficient ink utilization during a printing process.
-
Citations
38 Claims
-
1. A method of processing a mixed document defining a plurality of pixels, the method comprising the steps of:
-
(a) detecting a first set of the pixels corresponding to image pixels; (b) detecting a second set of the pixels corresponding to large text pixels; (c) computing, in a first color space, a first value corresponding to a white point of a media on which the document is printed; (d) computing, in the first color space, a second value corresponding to a black point of the media; (e) generating, via the first value, a table of values that are compensated for the white point of the media; (f) labeling, via the table of values, each of the pixels in the document with one of; (1) a color label; (2) a black label; and (3) a white label; and (g) applying a plurality of syntactic rules to pixel sequences having predetermined labels and predetermined lengths. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. An apparatus for segmenting a document comprising:
-
a large text region identifier to identify large text regions of the document; an image region identifier to identify image regions of the document; a document segmentor to label pixels in the large text region and the image region of the document with a first label and to label each of remaining plurality of pixels in the document with a first one of a second label and a third label; a pixel sequence selector to select a sequence of pixels in the document; and a pixel label validator to apply predetermined ones of a first plurality of syntactic rules to predetermined ones of pixels having a first predetermined label in a pixel sequence selected by said pixel sequence selector, to validate pixel labels of pixels which satisfy the predetermined ones of the first plurality of syntactic rules and to change pixel labels of pixels which do not satisfy the predetermined ones of the first plurality of syntactic rules. - View Dependent Claims (31, 32)
-
-
33. A color printing system comprising:
-
a scanner having a scanner color space, said scanner having an input port to receive a mixed color document to be scanned and an output port to provide a stream of bits representing the mixed color document; a document segmentation system having an input port and an output port, said document segmentor for receiving on the input port thereof the stream of digital bits from the output port of said scanner and to provide at the output port thereof a segmented document bit stream, said document segmentation comprising; a large text region identifier to identify large text regions of the mixed color document; an image region identifier to identify image regions of the mixed color document; a document segmentor to label pixels in the large text region and the image region of the document with a first label and to label each of the remaining plurality of pixels in the document with a one of a second label and a third label; a pixel sequence selector to select a sequence of pixels in the document; and a pixel label validator to apply predetermined ones of a first plurality of syntactic rules to predetermined ones of pixels having a first predetermined label in a pixel sequence selected by said pixel sequence selector, to validate pixel labels of pixels which satisfy the predetermined ones of the first plurality of syntactic rules and to change pixel labels of pixels which do not satisfy the predetermined ones of the first plurality of syntactic rules; and a printing system, having an input port coupled to the output port of said segmentation processor, said printing system to receive at the input port thereof a stream of bits which represent a segmented mixed color document and to print a mixed color document in response to a segmented document bit stream fed thereto from said segmentation system. - View Dependent Claims (34)
-
-
35. A computer program product for use with a color reproduction system including a scanner, a processor and a printer, the computer program product comprising:
-
a computer useable medium having computer readable program code to identify large text regions and image regions of a mixed color document; a computer useable medium having computer readable program code to label pixels in the large text region and the image region of the mixed color document with a first label; a computer useable medium having computer readable program code to label each of remaining plurality of pixels in the mixed color document with a first one of a second label and a third label; a computer useable medium having computer readable program code to select a sequence of pixels in the mixed color document; and a computer useable medium having computer readable program code to apply a first plurality of syntactic rules to predetermined ones of pixels having a first predetermined label in the selected pixel sequence. - View Dependent Claims (36, 37, 38)
-
Specification