PAGE ANALYSIS SYSTEM
First Claim
1. In a page analysis system for analyzing image data of a document page, a method for increasing the accuracy of image data classification, the method comprising the steps of:
- inputting image data of a document page as pixel data;
a first analyzing step for analyzing the pixel data in order to locate all connected pixels;
rectangularizing connected pixel data into blocks;
a second analyzing step for analyzing each block of pixel data in order to determine type of image data contained in each block;
outputting an attribute corresponding to the type of image data within the block determined in the second analyzing step; and
performing optical character recognition so as to recognize image data in a block in the case that the second analyzing step cannot determine a type of image data contained in the block.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for increasing the accuracy of image data classification in a page analysis system for analyzing image data of a document page. The method includes inputting image data of a document page as pixel data, analyzing the pixel data in order to locate all connected pixels, rectangularizing connected pixel data into blocks, analyzing each of the blocks of pixel data in order to determine the type of image data contained in the block, outputting an attribute corresponding to the type of image data determined in the analyzing step, and performing optical character recognition to attempt to recognize a character of the block of image data in the case that the analyzing step cannot determine the type of image data contained in the block.
-
Citations
24 Claims
-
1. In a page analysis system for analyzing image data of a document page, a method for increasing the accuracy of image data classification, the method comprising the steps of:
-
inputting image data of a document page as pixel data;
a first analyzing step for analyzing the pixel data in order to locate all connected pixels;
rectangularizing connected pixel data into blocks;
a second analyzing step for analyzing each block of pixel data in order to determine type of image data contained in each block;
outputting an attribute corresponding to the type of image data within the block determined in the second analyzing step; and
performing optical character recognition so as to recognize image data in a block in the case that the second analyzing step cannot determine a type of image data contained in the block. - View Dependent Claims (2)
-
-
3. In a page analysis system for analyzing image data of a document page, a method for accurately classifying image data, the method comprising the steps of:
-
inputting image data of a document page as pixel data;
combining and rectangularizing connected pixel data into blocks of image data; and
analyzing and classifying the data as a type of data;
wherein, in the case that a block of image data is classified as text data and the size of the text data is not equal to a preset size threshold, performing optical character recognition on the text data. - View Dependent Claims (4, 5, 6, 7, 8)
-
-
9. Computer-executable process steps stored in a computer-readable medium, the process steps for use in a page analysis system for analyzing image data of a document page, the process steps to increase the accuracy of image data classification, the process steps comprising:
-
an inputting step to input image data of a document page as pixel data;
a first analyzing step to analyze the pixel data in order to locate all connected pixels;
a rectangularizing step to rectangularize connected pixel data into blocks;
a second analyzing step to analyze each block of pixel data in order to determine type of image data contained in each block;
an outputting step to output an attribute corresponding to the type of image data within the block determined in the second analyzing step; and
a performing step to perform optical character recognition so as to recognize image data of a block in the case that the second analyzing step cannot determine a type of image data contained in the block. - View Dependent Claims (10)
-
-
11. Computer-executable process steps for analyzing image data of a document page, the steps comprising:
-
an inputting step to input image data of a document page as pixel data;
a combining and rectangularizing step to combine and rectangularize connected pixel data into blocks of image data; and
an analyzing and classifying step to analyze and classify the data as a type of data;
wherein, in the case that a block of image data is classified as text data and the size of the text data is not equal to a preset size threshold, performing optical character recognition on the text data. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. An apparatus for performing page analysis of a document page, the apparatus comprising:
-
a memory which stores page analysis process steps executable by a processor and an image of a document page; and
a processor which executes the page analysis process steps stored in the memory (1) to input image data of a document page as pixel data, (2) to analyze the pixel data in order to locate all connected pixels, (3) to rectangularize connected pixel data into blocks, (4) to analyze each block of pixel data in order to determine a type of image data contained in each block, (5) to output an attribute corresponding to the type of image data within a block analyzed by the processor, and (6) to perform optical character recognition to attempt to recognize a character of the block of image data in a case that the processor cannot determine a type of image data contained in the block. - View Dependent Claims (18)
-
-
19. An apparatus for analyzing image data of a document page, the apparatus comprising:
-
a memory which stores page analysis process steps executable by a processor and an image of a document page; and
a processor which executes the page analysis process steps stored in the memory (1) to input image data of a document page as pixel data, (2) to combine and rectangularize pixel data into blocks of image data, (3) to analyze and classify the data as a type of data, and (4) to perform optical character recognition on text data in a case that a block of image data is classified as text data and the size of the text data is not equal to a preset size threshold. - View Dependent Claims (20, 21, 22, 23, 24)
-
Specification