Method and apparatus for detecting running text in an image
First Claim
Patent Images
1. A method comprising:
- retrieving an input image, the image comprising an array of image signals and associated data defining a set of boundaries of a plurality of text-blocks represented therein, and storing the array of image signals in a bitmap array and the data defining the set of boundaries in a second array;
partitioning the text-blocks defined by the set of boundaries stored in the second array into text groups, wherein the partitioning step further comprises the steps of,applying a similarity grouping criterion to the text-blocks to identify a stable number of text groups in the input image, wherein the similarity grouping criterion is a sufficient stability criterion that is based upon the width of the text blocks along a raste,r anddividing the input image into the stable number of text groups; and
classifying the text-groups to determine those text-groups which represent running text regions of the image and those which represent non-running text regions of the image.
4 Assignments
0 Petitions
Accused Products
Abstract
The present invention is a method and apparatus for analyzing image data, and more particularly for analyzing image data representing images containing text to partition the image into running and non-running text regions therein. The present invention utilizes characteristics of running text regions to identify such regions and to subsequently group all non-running text regions into related groups.
-
Citations
14 Claims
-
1. A method comprising:
-
retrieving an input image, the image comprising an array of image signals and associated data defining a set of boundaries of a plurality of text-blocks represented therein, and storing the array of image signals in a bitmap array and the data defining the set of boundaries in a second array; partitioning the text-blocks defined by the set of boundaries stored in the second array into text groups, wherein the partitioning step further comprises the steps of, applying a similarity grouping criterion to the text-blocks to identify a stable number of text groups in the input image, wherein the similarity grouping criterion is a sufficient stability criterion that is based upon the width of the text blocks along a raste,r and dividing the input image into the stable number of text groups; and classifying the text-groups to determine those text-groups which represent running text regions of the image and those which represent non-running text regions of the image. - View Dependent Claims (2, 3, 4)
-
-
5. A method comprising:
-
retrieving an input image, the image comprising an array of image signals and associated data defining a set of boundaries of a plurality of text-blocks represented therein, and storing the array of image signals in a bitmap array and the data defining the set of boundaries in a second array; partitioning the text-blocks defined by the set of boundaries stored in the second array into text groups; and classifying the text-groups to determine those text-groups which represent running text regions of the image and those which represent non-running text regions of the image, wherein the step of classifying the text groups further comprises the steps of, (a) classifying those text groups having a group-width to page-width ratio greater than a first threshold as running text and all other groups as non-running text, (b) identifying the text groups according to the classification in step (a), (c) classifying the remaining running text groups having a gap-area to block-area ratio greater than a second threshold as running text and all other remaining running text groups as non-running text, and (d) identifying the groups according to the classifications of step (a) and step (c). - View Dependent Claims (6)
-
-
7. A method operating on a programmable computer for partitioning a an image containing text into regions of running text and non-running text, the image consisting essentially of an array of image signals and associated data defining a set of boundaries of a plurality of text-blocks represented therein, said method comprising the steps of:
-
retrieving an input image and storing image signals thereof in a first bitmap array memory location and the data defining the set of boundaries in a second memory location; partitioning the text-blocks defined by the boundaries stored in the second memory location into text groups wherein the partitioning step further comprises the steps of applying a similarity grouping criterion to the text-blocks to identify a stable number of text groups, wherein the similarity grouping criterion is a sufficient stability criterion that is based upon a dimension of the text blocks, and dividing the input image into the stable number of text groups; and classifying the text groups to determine those text groups which represent running text regions of the image and those which represent non-running text regions of the image. - View Dependent Claims (8, 9, 10)
-
-
11. A method operating on a programmable computer for partitioning an image containing text into regions of running text and non-running text, the image consisting essentially of an array of image signals and associated data defining a set of boundaries of a plurality of text-blocks represented therein, said method comprising the steps of:
-
retrieving an input image and storing image signals thereof in a first bitmap array memory location and the data defining the set of boundaries in a second memory location; partitioning the text-blocks defined by the boundaries stored in the second memory location into text groups; and classifying the text groups to determine those text groups which represent running text regions of the image and those which represent non-running text regions of the image wherein the step of classifying the text groups further comprises the substeps of (a) classifying those text groups having a group-width to page-width ratio greater than a first threshold as running text and all other groups as non-running text, (b) identifying the text groups according to the classification in step (a), (c) classifying the remaining running text groups having a gap-area to block-area ratio greater than a second threshold as running text and all other groups as non-running text, and (d) identifying the groups according to the classifications of substep (a) and substep (c). - View Dependent Claims (12)
-
-
13. An apparatus, comprising:
-
a first memory for storing image data; a second memory for storing data representing characteristics of an image, the bitmap data for said image being stored in said first memory array; instruction memory; a text processor, connected to said first and second memory and said instruction memory for accessing the data stored in the first and second memory in accordance with instructions stored in said instruction memory, the processor in executing the instructions; accessing the image data stored in the first memory location to produce text block boundaries representing text blocks in the image, the data defining the text block boundaries being stored in the second memory as image characteristic data; partitioning the text-blocks defined by the boundaries stored in the second memory location into text groups; and classifying the text-groups to determine those text-groups which represent running text regions of the image and those which represent non-running text regions of the image, wherein the text processor, operating in accordance with a partitioning instruction, applies a similarity grouping criterion to the text blocks to identify a stable number of text groups in the input image stored in said first memory, wherein the similarity grouping criterion is a sufficient stability criterion that is based upon a dimension of the text blocks, and divides the input image into the stable number of text groups, wherein data characterizing boundaries of the text groups is stored in said second memory. - View Dependent Claims (14)
-
Specification