ADAPTIVE AGGREGATION: IMPROVING THE PERFORMANCE OF GROUPING AND DUPLICATE ELIMINATION BY AVOIDING UNNECESSARY DISK ACCESS
First Claim
1. A computer-implemented method for use with information stored in blocks on a storage medium, the method comprising:
- reading blocks of the information from the storage medium into a memory of the computer using an aggregation method;
determining a number, k, of blocks to be written back to the storage medium from the memory;
selecting k blocks from the memory, sorting the selected blocks, and writing the sorted blocks as a new sublist to the storage medium;
iterating the steps of reading, determining, selecting, sorting, and writing sublists; and
merging the sublists to form an aggregation result.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for use with an aggregation operation (e.g., on a relational database table) includes a sorting pass and a merging pass. The sorting pass includes: (a) reading blocks of the table from a storage medium into a memory using an aggregation method until the memory is substantially full or until all the data have been read into the memory; (b) determining a number k of blocks to write back to the storage medium from the memory; (c) selecting k blocks from memory, sorting the k blocks, and then writing the k blocks back to the storage medium as a new sublist; and (d) repeating steps (a), (b), and (c) for any unprocessed tuples in the database table. The merging pass includes: merging all the sublists to form an aggregation result using a merge-sort algorithm.
-
Citations
20 Claims
-
1. A computer-implemented method for use with information stored in blocks on a storage medium, the method comprising:
-
reading blocks of the information from the storage medium into a memory of the computer using an aggregation method; determining a number, k, of blocks to be written back to the storage medium from the memory; selecting k blocks from the memory, sorting the selected blocks, and writing the sorted blocks as a new sublist to the storage medium; iterating the steps of reading, determining, selecting, sorting, and writing sublists; and merging the sublists to form an aggregation result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A database system, the database system including a table having tuples, the tuples stored in blocks on a computer-accessible storage medium, the database system executing steps for:
-
accessing the storage medium to read blocks of the table into a memory of the database system using an early-aggregation method; determining a number, k, of blocks to be written from the memory back to the storage medium; selecting k blocks from the memory, sorting the selected blocks, and accessing the storage medium to write the sorted blocks as a sublist; iterating the processes of reading, determining, selecting, sorting, and writing sublists, wherein the number of accesses to read blocks and write blocks is less than that required to read and write all blocks of the table; and merging the sublists to form an aggregation result. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computer program product for use with a relational database table having tuples stored in blocks on a computer-accessible storage medium, the computer program product comprising a computer useable medium including a computer readable program, wherein the computer readable program when executed on a computer causes the computer to:
-
read blocks of the table from the storage medium into a memory of the computer using an early-aggregation before sort method until the memory is full or until all the blocks have been read into the memory; determine a number, k, of blocks to be written back to the storage medium from the memory; select k blocks from the memory, sort the selected blocks, and write back the sorted blocks as a sublist to the storage medium; iterate the steps of reading, determining, selecting, sorting, and writing sublists until each block has been read into the memory, wherein the total number of block reads and writes is less than that required to read and write all blocks of the table; and merge the sublists to form an aggregation result. - View Dependent Claims (17, 18, 19, 20)
-
Specification