Method and system for intra-row, inter-row compression and decompression of data items in a database using a page-based structure where allocating a page-buffer based on a stored value indicating the page size
First Claim
1. In a computerized database system, a computer program product having computer-executable instructions stored thereon that, when executed, perform a method of compressing an ordered sequence of structurally similar data items in a collection into one or more fixed-size storage pages, the method comprising:
- distributing data in a manner that allows subsequent processing, based on parts of the data item that are unique and from which an order in the sequence can be determined, to find any particular data item without reading any of the storage pages other than the page where the data of the item is located, while also allowing the insertion of new data items in the collection and removal of data items from the collection by reading and rewriting a small subset of the storage pages and possibly adding one or more new pages to the set of pages used by the collection, the set of pages being stored in an external list;
establishing for each page a corresponding intermediate buffer where the data items stored on that page are represented in a decompressed form, the decompressed form being one of a fully decompressed form or a partially decompressed form, and when the data on a page is needed for reading, insertions, or deletions, the corresponding intermediate buffer is found in cache or created as the page data is being read;
if inserting an item into the collection, reading the intermediate buffer corresponding to the page where it would be located as described in the acts of locating pages and configuring each page, after which the item is inserted into the intermediate buffer in the appropriate sequence;
if removing an item from the collection, reading the intermediate buffer corresponding to the page where it would be located as described in the acts of locating pages and establishing intermediate buffers, after which the item is removed from the intermediate buffer;
if no data items are found in the intermediate buffer after processing the intermediate buffers that have been altered, removing the corresponding page from the list;
if the data can be compressed into a single page, recompressing the intermediate buffer into a single page; and
if the data cannot be compressed into a single page, splitting the intermediate buffer into a plurality of intermediate buffers, each corresponding to a page in the list.
0 Assignments
0 Petitions
Accused Products
Abstract
A database compression system includes a compression plug-in that allows a database to be compressed using multiple compression algorithms. As well, implementations of the present; invention allow inter-row compression to be used with fixed-page sizes in a page-based database. For example, the compression plug-in inter-row decompresses a requested page from sub-storage, and allocates a page buffer that corresponds at least to the size of the page data when inter-row decompressed. The compression plug-in then adds data to the page buffer using intra-row compression, such as gamma compression. When the page data is no longer needed, the compression plug-in compresses the page data using inter-row compression, and passes the compressed page data from the page buffer to the corresponding page, which is fixed in size, in sub-storage.
-
Citations
9 Claims
-
1. In a computerized database system, a computer program product having computer-executable instructions stored thereon that, when executed, perform a method of compressing an ordered sequence of structurally similar data items in a collection into one or more fixed-size storage pages, the method comprising:
-
distributing data in a manner that allows subsequent processing, based on parts of the data item that are unique and from which an order in the sequence can be determined, to find any particular data item without reading any of the storage pages other than the page where the data of the item is located, while also allowing the insertion of new data items in the collection and removal of data items from the collection by reading and rewriting a small subset of the storage pages and possibly adding one or more new pages to the set of pages used by the collection, the set of pages being stored in an external list; establishing for each page a corresponding intermediate buffer where the data items stored on that page are represented in a decompressed form, the decompressed form being one of a fully decompressed form or a partially decompressed form, and when the data on a page is needed for reading, insertions, or deletions, the corresponding intermediate buffer is found in cache or created as the page data is being read; if inserting an item into the collection, reading the intermediate buffer corresponding to the page where it would be located as described in the acts of locating pages and configuring each page, after which the item is inserted into the intermediate buffer in the appropriate sequence; if removing an item from the collection, reading the intermediate buffer corresponding to the page where it would be located as described in the acts of locating pages and establishing intermediate buffers, after which the item is removed from the intermediate buffer; if no data items are found in the intermediate buffer after processing the intermediate buffers that have been altered, removing the corresponding page from the list; if the data can be compressed into a single page, recompressing the intermediate buffer into a single page; and if the data cannot be compressed into a single page, splitting the intermediate buffer into a plurality of intermediate buffers, each corresponding to a page in the list. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computerized information retrieval and storage system for accessing data from a B-Tree or B+Tree database and storing new data in the B-tree database structure such that the ordered sequence of structurally similar items stored in the B-Tree are compressed such that the fixed-size pages that provide advantages to B-trees are retained, the computerized information retrieval and storage system comprising:
-
a storage containing fixed-size B-Tree pages, leaf pages of the fixed-size B-Tree pages each contain a compressed form of a number of adjacent items in the sequence, and node pages of the fixed-size B-Tree each contain a compressed form of one item from each of the corresponding leaf pages, and a root page which may be either a leaf page or a node page; a page buffer for each B-Tree page, the page buffer being large enough to hold the items represented in the corresponding page in a partially or fully decompressed form, such that each item can be accessed individually, without having to decompress data from other items in the buffer; the page buffer being used to store newly inserted items in addition to previously-existing items, with B-Tree page splitting being determined not at item insertion time, but rather when the page buffer is recompressed in preparation to flush the buffer to the B-Tree page storage; the page buffer being used to store the remaining previously-existing items, with B-Tree page merging being determined not at item removal time, but rather when the page buffer is recompressed in preparation to flush the buffer to the B-Tree page storage; if the page buffer cannot be compressed into a single B-Tree page, the corresponding B-Tree page is split into a plurality of pages, with the page buffer being split into a corresponding number of page buffers; if the page buffer is empty or nearly empty, the corresponding B-Tree page is merged with other B-Tree pages, with the page buffer also being merged with the page buffers corresponding to the B-Tree pages with which the original B-Tree page was merged; and a specialized compression plug-in configured to determine when the buffer cannot be compressed into a single B-Tree page. - View Dependent Claims (9)
-
Specification