Colour document layout analysis with multi-level decomposition
First Claim
1. A method of classifying regions of a scanned document image, said method comprising the steps of:
- (a) partitioning the scanned image into a plurality of tiles;
(b) determining at least one dominant colour for each of the plurality of tiles;
(c) generating superpositioned regions based on the dominant colours, each said superpositioned region representing a group of tiles, wherein at least one tile is grouped into two superpositioned regions and each dominant colour is represented by at most one of the superpositioned regions, and wherein the generation of superposition regions comprises;
(ca) converting each of the tiles into a representation having a plurality of layers, the representation corresponding to at least one said tiles comprising multiple coloured layers, each said tile comprising a superposition of the corresponding said coloured layers; and
(cb) merging, for each of said coloured layers, adjacent ones of said tiles, thereby generating the superpositioned regions and a multi-layered document representation of the document;
(d) calculating statistics for each said superpositioned region using pixel level statistics from each of the tiles included in said superpositioned region; and
(e) determining a classification for each superpositioned region based on the calculated statistics.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed is a method of classifying segmented contents of a scanned image of a document. The method comprise partitioning the scanned image into color segmented tiles at pixel level. The method then generates superpositioned segmented contents, each segmented content representing related color segments in at least one color segmented tile. Statistics are then calculated for each segmented content using pixel level statistics from each of the tile color segments included in segmented content, and then determines a classification for each segmented content based on the calculated statistics. The segmented content may be macroregions. The macroregions may form part of a multi-layered document representation of the document. Each of a plurality of tiles of predetermined size of the image are converted into a representation having a plurality of layers, the representation corresponding to at least one said tiles comprising multiple colored layers, each tile comprising a superposition of the corresponding colored layers. For each of the colored layers, merging is performed with adjacent ones of the tiles, thereby generating a multi-layered document representation.
-
Citations
16 Claims
-
1. A method of classifying regions of a scanned document image, said method comprising the steps of:
-
(a) partitioning the scanned image into a plurality of tiles; (b) determining at least one dominant colour for each of the plurality of tiles; (c) generating superpositioned regions based on the dominant colours, each said superpositioned region representing a group of tiles, wherein at least one tile is grouped into two superpositioned regions and each dominant colour is represented by at most one of the superpositioned regions, and wherein the generation of superposition regions comprises; (ca) converting each of the tiles into a representation having a plurality of layers, the representation corresponding to at least one said tiles comprising multiple coloured layers, each said tile comprising a superposition of the corresponding said coloured layers; and (cb) merging, for each of said coloured layers, adjacent ones of said tiles, thereby generating the superpositioned regions and a multi-layered document representation of the document; (d) calculating statistics for each said superpositioned region using pixel level statistics from each of the tiles included in said superpositioned region; and (e) determining a classification for each superpositioned region based on the calculated statistics. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16)
-
-
14. A non-transitory computer readable storage medium having a program recorded thereon, the program being executable by a computer to classify regions of a scanned document image, said program comprising:
-
code for partitioning the scanned image into a plurality of tiles; code for determining at least one dominant colour for each of the plurality of tiles; code for generating superpositioned regions based on the dominant colours, each said superpositioned region representing a group of tiles, wherein at least one tile is grouped into two superpositioned regions and each dominant colour is represented by at most one of the superposition regions, and wherein the code of generation of superpositioned regions comprises; (a) code for converting each of the tiles into a representation having a plurality of layers, the representation corresponding to at least one said tiles comprising multiple coloured layers, each said tile comprising a superposition of the corresponding said colour layers; and (b) code for merging, for each of said coloured layers, adjacent ones of said tiles, thereby generating the superpositioned regions and a multi-layered document representation of the document; code for calculating statistics for each said superpositioned region using pixel level statistics from each of the tiles included in said superpositioned region; and code for determining a classification for each superpositioned region based on calculated statistics.
-
-
15. A system for analysing and classification of a document, said system comprising:
-
a scanning device arranged to scan the document to form a raw pixel image of the document; a computer apparatus configured to receive the image of the document and to partition the image into a plurality of tiles of predetermined size and to process the tiles in raster tile order, the processing comprising; determining at least one dominant colour for each of the plurality of tiles generating superpositioned regions based on the dominant colours, each said superpositioned region representing a group of tiles wherein at least one tile is grouped into two superpositioned regions and each dominant colour is represented by at most one of the superpositioned regions, said generating comprising; (a) converting each of the tiles into a representation having a plurality of layers, the representation corresponding to at least one said tiles comprising multiple coloured layers, each said tile comprising a superposition of the corresponding said coloured layers; and (b) merging, for each of said coloured layers, adjacent ones of said tiles, thereby generating the superpositioned regions and a multi-layered document representation of the document; calculating statistics for each said superpositioned region using pixel level statistics from each of the tiles included in said superpositioned region; and determining a classification for each superpositioned region based on the calculated statistics.
-
Specification