Compression/decompression algorithm for image documents having text, graphical and color content
First Claim
1. A computer program product for compressing data files representative of an image document having color information and/or graphical information, said software product disposed on a computer readable medium comprising instructions for causing a computer to:
- provide a first image file resolution of said document at a first and a second image file at a second resolution of said document with said-second resolution being lower than said first resolution;
process the first image file to convert the first image file into a text file representation of the document;
compress said text file representation of the document to provide a first compressed file;
process said second file to extract information from the image representation of the document corresponding to color information and graphics information;
compress the second file using a second compression technique to provide a second compressed file containing information corresponding to the image;
store said first and second compressed files and color information to provide a composite compressed file corresponding to the document;
scan the document at a first resolution to provide said first image file at said first resolution and scan the document at a second resolution to provide said second image file at said second resolution;
determine the foreground color corresponding to colors associated with text positions of the document; and
determine the foreground colors by causing the computer to retrieve a plurality of samples of blocks of pixels from the low resolution image representation of the document and from each one of said samples of pixels find a pixel corresponding to the minimum and maximum intensity of -the pixels in the block;
and for each one of said samples calculate a threshold value representative of the document by averaging the minimum and maximum intensities for each of the blocks;
determine a color associated with each one of the blocks and the width of intensity of each one of the blocks and provide a data structure having an entry for each one of said blocks corresponding to a foreground color.
15 Assignments
0 Petitions
Accused Products
Abstract
A computer program product for compressing data files representative of an image document. The document includes color information and/or graphical information. The product is on a computer readable medium and includes instructions for causing a computer to provide a first image file at a first resolution and a second image file at a second resolution of said document with said second resolution being lower than said first resolution. The product causes a computer to process the first image file to convert the first image file into a text file representation of the document and compress the text file representation of the document to provide a first compressed file. The computer processes the second file to extract information corresponding to color information and graphics information. It compresses the second file using a second, different compression technique to provide a second compressed file corresponding to the image and the color information from the low resolution image file. The product causes a computer to store said first and second compressed files to provide a composite file corresponding to the compressed file of the document.
-
Citations
7 Claims
-
1. A computer program product for compressing data files representative of an image document having color information and/or graphical information, said software product disposed on a computer readable medium comprising instructions for causing a computer to:
-
provide a first image file resolution of said document at a first and a second image file at a second resolution of said document with said-second resolution being lower than said first resolution;
process the first image file to convert the first image file into a text file representation of the document;
compress said text file representation of the document to provide a first compressed file;
process said second file to extract information from the image representation of the document corresponding to color information and graphics information;
compress the second file using a second compression technique to provide a second compressed file containing information corresponding to the image;
store said first and second compressed files and color information to provide a composite compressed file corresponding to the document;
scan the document at a first resolution to provide said first image file at said first resolution and scan the document at a second resolution to provide said second image file at said second resolution;
determine the foreground color corresponding to colors associated with text positions of the document; and
determine the foreground colors by causing the computer to retrieve a plurality of samples of blocks of pixels from the low resolution image representation of the document and from each one of said samples of pixels find a pixel corresponding to the minimum and maximum intensity of -the pixels in the block;
and for each one of said samples calculate a threshold value representative of the document by averaging the minimum and maximum intensities for each of the blocks;
determine a color associated with each one of the blocks and the width of intensity of each one of the blocks and provide a data structure having an entry for each one of said blocks corresponding to a foreground color.
-
-
2. A computer system including a computer software product for compressing data files representative of an image document, said document including color information and/or graphical information, said computer system including:
-
a processor to execute said software instructions;
a memory storing said software program;
a display which displays representations of said document;
said software product disposed on a computer readable medium comprising instructions for causing a computer to;
provide a first image file of said document at a first resolution and a second image file of said document at a second resolution with said second resolution being lower than said first resolution;
process the first image file to convert the first image file into a text file representation of the document;
compress said text file representation of the document to provide a first compressed file;
process said second file to extract information from the image representation of the document corresponding to color information and graphics information;
compress the second file using a second compression technique to provide a second compressed file containing information corresponding to the image;
store said first and second compressed files and said color information to provide a composite compressed file of the document;
scan the document at a first resolution to provide said first image file at said first resolution and scan the document at a second resolution to provide said second image file at said second resolution;
determine foreground color corresponding to colors associated with text positions of the document; and
determine the foreground colors by causing the computer to retrieve a plurality of samples of groups of pixels from the low resolution image representation of a document and from each one of said samples of pixels find a pixel corresponding to the minimum and maximum intensity of the pixels in the sample;
and for each one of said samples calculate a threshold value representative of the document by averaging the minimum and maximum intensities for each of the blocks;
determine a color associated with each one of the blocks and the width of intensity of each one of the blocks; and
provide a color data structure having an entry for each one of said blocks corresponding to a foreground color.
-
-
3. A computer program product for decompressing a file containing image information and text information, said program residing on a computer readable medium comprising instructions for causing a computer to:
-
decompress the file containing image information and text information into an image file and a text file;
allocate a target bit map to represent the decompressed file;
insert the decompressed image information into the target bit map at locations specified by information contained in said file containing image information and text information;
insert text information into said target bit map in accordance with positional information provided from the decompressed text file; and
fill the target output bit map with a color corresponding to a dominant background color provided from color information in the file. - View Dependent Claims (4)
-
-
5. A method of compressing an image representation of a document having color portions and text portions comprises the steps of:
-
scanning a document to provide a first file at a first resolution and a second file at a second, lower resolution;
converting the first file into a text file;
applying an auto-rotate filter to the first file to correct said file for errors;
converting said high resolution image file into an optical character recognition file having text information and positional information corresponding to the text information on the image document;
masking portions of said optical character recognition file corresponding to portions of said document representing graphical information associated with the document; and
compressing the unmasked portions of said optical character recognition file to provide a compressed text file;
applying a rotate filter to the second file to correct errors in said second file;
determining from said second file foreground colors associated with each of the sections of said document and background colors associated with each portion of said document;
determining from said background colors a dominant background color that best represents the background color of the document;
masking portions of said document not corresponding to the graphical portions of the document; and
compressing said unmasked portions to provide a second file corresponding to graphical portions of the document and storing said color information, and said first and second files as a composite file.
-
-
6. A computer system including a computer software product for compressing data files representative of an image document, said document including color information and/or graphical information, said computer system including:
-
a processor to execute said software instructions;
a memory storing said software program;
a display which displays representations of said document;
said software product disposed on a computer readable medium comprising instructions for causing a computer to;
provide a first image file of said document at a first resolution and a second image file of said document at a second resolution with said second resolution being lower than said first resolution;
process the first image file to convert the first image file into a text file representation of the document;
compress said text file representation of the document to provide a first compressed file;
process said second file to extract information from the image representation of the document corresponding to color information and graphics information;
compress the second file using a second compression technique to provide a second compressed file containing information corresponding to the image;
store said first and second compressed files and said color information to provide a composite compressed file of the document; and
determine a dominant background color in the file.
-
-
7. A computer system including computer software product for compressing data files representative of an image document, said document including color information and/or graphical information, said computer system including:
-
a processor to execute said software instructions;
a memory storing said software program;
a display which displays representations of said document;
said software product disposed on a computer readable medium comprising instructions for causing a computer to;
provide a first image file of said document at a first resolution and a second image file of said document at a second resolution with said second resolution being lower than said first resolution;
process the first image file to convert the first image file into a text file representation of the document;
compress said text file representation of the document to provide a first compressed file;
process said second file to extract information from the image representation of the document corresponding to color information and graphics information;
compress the second file using a second compression technique to provide a second compressed file containing information corresponding to the image;
store said first and second compressed files and said color information to provide a composite compressed file of the document; and
retrieve background color information associating a background color with each one of a plurality of samples of pixels representing the document;
filter said background colors to provide a target number of colors to represent the background colors;
apply a median cut analysis to the background color samples to filter said background samples into one of a plurality of boxes corresponding to said target number of colors;
sort said boxes by increasing volume;
sort a first portion of said boxes having the smallest amount of volume by decreasing intensity; and
determine the dominant background color as a color to represent the background of the document by the box having the lowest intensity.
-
Specification