System and method for adaptively loading input data into a multi-dimensional clustering table
First Claim
1. A method of loading an input data stream into a data structure containing data that is clustered along one or more dimensions, comprising:
- storing, in a partial block cache, a plurality of partial blocks that are assembled from the input data stream;
wherein each partial block is associated with a distinct logical cell; and
storing, in a partial page cache, a plurality of last partial pages of the partial blocks that have been victimized from the partial block cache.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and associated method load an input data stream into a multi-dimensional clustering (MDC) table or other structure containing data clustered along one or more dimensions, by assembling blocks of data in a partial block cache in which each partial block is associated with a distinct logical cell. A minimum threshold number of partial blocks may be maintained. Partial blocks may be spilled from the partial block cache to make room for new logical cells. Last partial pages of spilled partial blocks may be stored in a partial page cache to limit I/O if the cell associated with a spilled block is encountered later in the input data stream. Buffers may be reassigned from the partial block cache to the partial page cache if the latter is filled. Parallelism may be employed for efficiency during sorting of input data subsets and during storage of blocks to secondary storage.
16 Citations
46 Claims
-
1. A method of loading an input data stream into a data structure containing data that is clustered along one or more dimensions, comprising:
-
storing, in a partial block cache, a plurality of partial blocks that are assembled from the input data stream; wherein each partial block is associated with a distinct logical cell; and storing, in a partial page cache, a plurality of last partial pages of the partial blocks that have been victimized from the partial block cache. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer program product having instruction codes for loading an input data stream into a data structure containing data that is clustered along one or more dimensions, comprising:
-
a first set of instruction codes for storing, in a partial block cache, a plurality of partial blocks that are assembled from the input data stream; wherein each partial block is associated with a distinct logical cell; and a second set of instruction codes for storing, in a partial page cache, a plurality of last partial pages of the partial blocks that have been victimized from the partial block cache. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A system for loading an input data stream into a data structure containing data that is clustered along one or more dimensions, comprising:
-
means for storing, in a partial block cache, a plurality of partial blocks that are assembled from the input data stream; wherein each partial block is associated with a distinct logical cell; and means for storing, in a partial page cache, a plurality of last partial pages of the partial blocks that have been victimized from the partial block cache. - View Dependent Claims (32, 33, 34, 35, 36)
-
-
37. A method of loading an input data stream into a data structure containing data that is clustered along one or more dimensions, comprising:
-
storing, in a partial block cache, a plurality of partial blocks that are assembled from the input data stream; wherein each partial block is associated with a distinct logical cell; and storing, in a partial subblock cache, a plurality of last partial subblocks of the partial blocks that have been victimized from the partial block cache. - View Dependent Claims (38, 39, 40, 41)
-
-
42. A computer program product having instruction codes for loading an input data stream into a data structure containing data that is clustered along one or more dimensions, comprising:
-
a first set of instruction codes for storing, in a partial block cache, a plurality of partial blocks that are assembled from the input data stream; wherein each partial block is associated with a distinct logical cell; and a second set of instruction codes for storing, in a partial subblock cache, a plurality of last partial subblocks of the partial blocks that have been victimized from the partial block cache. - View Dependent Claims (43, 44, 45, 46)
-
Specification