External sorting using key value distribution and range formation
First Claim
1. A method for executing an external distribution sort, the data to be rearranged including N keyed records accessed by a CPU from associative secondary storage, the CPU having predetermined available internal memory of F records capacity, comprising the steps of:
- random sampling of S keys, internally sorting the sampled keys within the CPU, and representing the sorted order by a sequential pointer list;
forming equal sized partitions averaging N/S<
F records in a single pass by obtaining a histogram of keys and ranges of key values including for each key the step of performing a binary search on the pointer list to ascertain the range associated with the key, and forming S+1 ranges and denoting the sorted key values by X1, X2, . . . . , XS for each value i from one to S+1, the ith range including the set of records whose key value lies in the semi-closed interval Xi-1 to Xi, and combining contiguous lesser populated ranges to form larger ranges such that each partition fits within the CPU internal memory; and
associatively retrieving all of the records whose keys lie within a range of a partition and internally sorting said records.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for executing an external distribution sort in which the data to be rearranged includes keyed stored records that can be accessed on associative secondary storage. The method steps include random sampling of a certain number of keys and internally sorting the sampled keys; forming equal sized partitions of records in a single pass, each partition of which can fit within internal CPU memory and constitute a range of key values; and associatively retrieving all of the records whose keys lie within a range and internally sorting said records.
87 Citations
7 Claims
-
1. A method for executing an external distribution sort, the data to be rearranged including N keyed records accessed by a CPU from associative secondary storage, the CPU having predetermined available internal memory of F records capacity, comprising the steps of:
-
random sampling of S keys, internally sorting the sampled keys within the CPU, and representing the sorted order by a sequential pointer list; forming equal sized partitions averaging N/S<
F records in a single pass by obtaining a histogram of keys and ranges of key values including for each key the step of performing a binary search on the pointer list to ascertain the range associated with the key, and forming S+1 ranges and denoting the sorted key values by X1, X2, . . . . , XS for each value i from one to S+1, the ith range including the set of records whose key value lies in the semi-closed interval Xi-1 to Xi, and combining contiguous lesser populated ranges to form larger ranges such that each partition fits within the CPU internal memory; andassociatively retrieving all of the records whose keys lie within a range of a partition and internally sorting said records. - View Dependent Claims (2)
-
-
3. A new use of a data processor having a central processing unit, an internal memory of capacity M, an associative storage device, and means coupling the memory and the device to the CPU, said new use comprising a method for executing a distribution sort on data including keyed records accessible on the associative device, the method steps include:
-
random sampling S record keys and sorting the sampled keys; counting how many records belong to each range defined by the sorted sample, and combining lesser populated contiguous ranges to form larger ranges, each range of which containing roughly the number of records that can fit within the internal memory, in order to estimate the distribution of key values of the records constituting a file in a single pass; and associatively retrieving all of the records whose keys lie within a range and internally sorting the retrieved records. - View Dependent Claims (4)
-
-
5. A method for executing an external distribution sort, the data to be rearranged including keyed records accessed by a CPU from associative secondary storage, the CPU having predetermined available internal memory M, the method steps include:
-
random sampling of S keys and internal sorting of the sampled keys including the step of representing the sorted order by a sequential pointer list; forming equal sized partitions of records in a single pass by obtaining a histogram of keys and ranges of key values and combining contiguous lesser populated ranges to form larger ranges such that each partition fits within the CPU internal memory, said partition forming step includes the step of performing a binary search on the pointer list in order to ascertain the range associated with the key for each key in the histogram; and associatively retrieving all of the records whose keys lie within a range of a partition and internally sorting said records. - View Dependent Claims (6, 7)
-
Specification