Method and apparatus for document segmentation by background analysis
First Claim
1. A method for extracting document elements from a document image, comprising the steps of:
- identifying at least one connected component in the document image;
generating a minimum bounding box for each at least one connected component in the document image;
identifying primitive background areas surrounding the at least one bounding box for the at least one connected component;
selectively eliminating the primitive background areas such that each remaining primitive background area exceeds a minimum threshold;
selectively combining the remaining primitive background areas in the document image which uniquely share a common edge to form major background regions;
determining at least one set of major background regions based on intersection points between the major background regions;
segmenting the document image into document elements based on the at least one set of determined major background regions; and
extracting the document elements segmented by the at least one set of determined major background regions.
4 Assignments
0 Petitions
Accused Products
Abstract
A system for extracting document elements from a document using major white regions includes an input port for inputting a signal representing the document image, a connected component generator for generating connected components from the document image, a bounding box generator for generating a bounding box around each connected component, a major white region extractor for extracting major white regions from the document image, and a document element extractor for extracting the document elements from the document image. The method for extracting document elements comprises identifying primitive white areas laying between the bounding boxes, grouping the primitive white areas into groups, identifying the primitive white areas and groups which are major white regions, identifying closed loops of major white regions, segmenting the major white regions, locating closed loops of the segments, and identifying each portion of the document image enclosed by one of the closed loops of document segments as a document element.
-
Citations
18 Claims
-
1. A method for extracting document elements from a document image, comprising the steps of:
-
identifying at least one connected component in the document image; generating a minimum bounding box for each at least one connected component in the document image; identifying primitive background areas surrounding the at least one bounding box for the at least one connected component; selectively eliminating the primitive background areas such that each remaining primitive background area exceeds a minimum threshold; selectively combining the remaining primitive background areas in the document image which uniquely share a common edge to form major background regions; determining at least one set of major background regions based on intersection points between the major background regions; segmenting the document image into document elements based on the at least one set of determined major background regions; and extracting the document elements segmented by the at least one set of determined major background regions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. An apparatus for segmenting a document image into at least one document element, the document image represented by a plurality of pixels, the plurality of pixels represented by portions of an image signal, the apparatus comprising:
-
input means for inputting the image signal; memory means for storing the image signal; connected component identifying means for identifying at least one connected component from the plurality of pixels forming the document image; bounding box generating means for generating a minimum bounding box around each one of the at least one connected component; background area identifying means for identifying background areas surrounding the at least one bounding box for the at least one connected component; background area eliminating means for selectively eliminating background areas such that each remaining background area exceeds a minimum threshold; major background region extraction means for selectively combining the remaining background areas of the document image which uniquely share a common edge to form major background regions; determining means for determining at least one set of major background regions that segment the document image into at least one document element based on intersection points between the major background regions; document element extraction means for extracting the at least one document element segmented by the at least one set of determined major background regions; and control means for controlling the input means, the memory means, the connected component identifying means, the bounding box generating means, the background area identifying means, the background area eliminating means, the major background region extraction means, the determining means and the document element extraction means. - View Dependent Claims (17, 18)
-
Specification