Storage management system for document image database
First Claim
1. A method of managing the storage of documents in a document image database, said documents having been converted into a digital data file, comprising the steps of:
- providing as input said data file that contains undifferentiated regions of both text and non-text;
analyzing said data file using identification rules to differentiate between regions containing text and regions containing non-text;
repeatedly modifying said differentiated regions containing non-text according to storage preference rules based on a time parameter associated with said data file to reduce storage size such that the storage size gradually reduces as a function of elapsed time;
compiling said modified regions into a reduced data file; and
storing said reduced data file in said database.
3 Assignments
0 Petitions
Accused Products
Abstract
A method of managing storage in a document image database using document analysis to partition documents into logical regions and modified by reducing storage size of the regions using different reduction means according to various storage preference rules. Storage preference rules are intended to maintain high quality representations of important document information while reducing storage requirements at the expense of lesser important aspects of the document. In particular, the different reduction means applied to stored document images includes reducing sampling depth, reducing sampling resolution based on minimum font size, utilizing lossy and lossless compression schemes and discarding unimportant regions of document image. Over time, document analysis and modification can be repeated to further reduce the storage size of previously stored data files.
-
Citations
33 Claims
-
1. A method of managing the storage of documents in a document image database, said documents having been converted into a digital data file, comprising the steps of:
-
providing as input said data file that contains undifferentiated regions of both text and non-text;
analyzing said data file using identification rules to differentiate between regions containing text and regions containing non-text;
repeatedly modifying said differentiated regions containing non-text according to storage preference rules based on a time parameter associated with said data file to reduce storage size such that the storage size gradually reduces as a function of elapsed time;
compiling said modified regions into a reduced data file; and
storing said reduced data file in said database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
reanalyzing said reduced data file, whereby said reanalyzing is triggered by a condition within said database system; and
modifying said reduced data file according to said storage preference rules.
-
-
14. The method of claim 13 wherein said condition is further defined as an amount of time since said document was converted into a data file.
-
15. The method of claim 13 wherein said condition is further defined as an amount of time since a user last accessed said data file.
-
16. The method of claim 13 wherein said condition is further defined as exceeding a threshold amount of storage capacity as measured within said database system.
-
17. A method of managing the storage of documents in a document image database system, said documents having been converted into digital data files, comprising the steps of:
-
providing as input at least one of said data files, said data file containing undifferentiated regions of both text and non-text;
analyzing data within said data file to differentiate between at least two regions, said first region containing text and said second region containing non-text;
modifying said differentiated regions in order to reduce storage size of said regions;
compiling said reduced regions into a reduced data file;
storing said reduced data file in said database;
analyzing said reduced data file, whereby said analyzing is triggered by a condition within said database system to further reduce storage size of said reduced data file; and
modifying said reduced data file according to storage preference rules, wherein said reduced data file is modified by storage reduction means for reducing storage size of said second region of said reduced data file such that storage size of said first region remains unchanged. - View Dependent Claims (18, 19, 20)
-
-
21. A computer-implemented apparatus for supporting storage management system for documents in a document image database, said documents having been converted into a digital data files, comprising:
-
a database for storing said data files;
an input to an analyzing module, said input providing as an input at least one of said data files, said data file containing undifferentiated regions of both text and non-text;
said analyzing module coupled to said database for identifying at least two regions within the data file, said first region substantially containing text and said second region substantially containing non-text, said analyzing module partitioning said document into said regions; and
a modification module coupled to at least one of said database and analyzing module for reducing storage size of said identified regions as a function of elapsed time. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A computer-implemented apparatus for supporting storage management system for documents in a document image database, said documents having been converted into digital data files, comprising:
-
a database for storing said data files;
an input to an analyzing module, said input providing as an input at least one of said data files, said data file containing undifferentiated regions of both text and non-text;
an analyzing module coupled to said database for identifying at least two regions within said data file and for partitioning said data file into said regions, said first region containing text and said second region containing non-text;
a modification module coupled to at least one of said database and analyzing module for reducing storage size of said identified regions of said data file into a reduced data file according to storage preference rules; and
a scheduler module coupled to said analyzing module for triggering reanalysis of said reduced data file based on a condition within said system such that the second region can be reduced in storage size without reducing the first region. - View Dependent Claims (31, 32, 33)
-
Specification