System and method for sorting data in a computer system using edge buckets for key values having a maximum and minimum range within a subinterval
First Claim
1. A method for operating a computer having a processor and memory to sort a set of data records stored in the memory each record having an associated key for governing the sort process, the method comprising the steps performed by the computer of:
- (a) determining a calculated minimum and maximum for the key values by sampling the key values;
(b) defining a plurality of buckets, each bucket corresponding to a respective one of a plurality M of subintervals between the calculated minimum and maximum, and two edge buckets for key values outside the range, each subinterval having a respective index;
(c) distributing the keys among the buckets by determining directly from each key value, the index of the subinterval into which the key value falls; and
(d) processing the buckets in sequence in order to sort the records, sorting the keys in each bucket when the bucket contains more than one key.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is described for operating a computer to sort a set of data records each having an associated key for governing the sort process, the method comprising determining a range for the key values by sampling the key values; defining a plurality of buckets, each bucket corresponding to a respective one of a plurality M of subintervals in the range, and two edge buckets for key values outside the range, each subinterval having a respective index; distributing the keys among the buckets by determining directly from each key value the index of the subinterval into which the key value falls; and processing the buckets in sequence in order to sort the records, sorting the keys in each bucket if the bucket contains more than one key.
-
Citations
14 Claims
-
1. A method for operating a computer having a processor and memory to sort a set of data records stored in the memory each record having an associated key for governing the sort process, the method comprising the steps performed by the computer of:
-
(a) determining a calculated minimum and maximum for the key values by sampling the key values; (b) defining a plurality of buckets, each bucket corresponding to a respective one of a plurality M of subintervals between the calculated minimum and maximum, and two edge buckets for key values outside the range, each subinterval having a respective index; (c) distributing the keys among the buckets by determining directly from each key value, the index of the subinterval into which the key value falls; and (d) processing the buckets in sequence in order to sort the records, sorting the keys in each bucket when the bucket contains more than one key. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A data processing apparatus for sorting a set of data records each having an associated key for governing the sort process, the apparatus comprising:
-
means for determining a range of the key values by sampling the key values; means for defining a plurality of buckets, each bucket corresponding to a respective one of a plurality M of subintervals in the range, and two edge buckets for key values outside the range, each subinterval having a respective index; means for selecting the size and spacing of the subintervals so that the probability of more than one key value falling into a subinterval is low whereby after the distribution most of the buckets will contain either zero or one element; and means for distributing the keys among the buckets by determining directly from each key value the index of the subinterval into which the key value falls, the apparatus being arranged to processing the bucket in sequence in order to sort the records, sorting the keys in each bucket if the bucket contains more than one key.
-
-
8. A method for operating a computer having a processor and memory to sort a set of data records stored in the memory each record having an associated key for governing the sorting of the set of data records, the method comprising the steps performed by the computer of:
-
(a) determining a range for the key values by sampling the key values; (b) defining a plurality of buckets, each bucket corresponding to a respective one of a plurality M of subintervals in the range, and two edge buckets for key values outside the range, each subinterval having a respective index; (c) distributing the keys among the buckets by determining directly from each key value the index of the subinterval into which the key value falls; (d) processing the buckets in sequence in order to sort the records, sorting the keys in each bucket when the bucket contains more than one key; and (e) determining from the key samples whether a substantial number of the keys share a common prefix, and, if so, dividing the keys into subsets, one of which comprises all keys sharing the common prefix and sorting the key subsets separately.
-
-
9. An article of manufacture for use in a computer system for sorting a set of data records, the computer system having a processor and memory wherein the set of data records are stored in the memory, each record having an associated key for governing the sorting of the set,
said article of manufacture comprising a computer readable storage medium having a computer readable programming code embodied in said medium which causes the computer to: -
determine a range for the key values by sampling the key values; define a plurality of buckets, each bucket corresponding to a respective one of a plurality M of subintervals in the range, and two edge buckets for key values outside the range, each subinterval having a respective index, wherein the size and spacing of the subintervals are selected so that the probability of more than one key value falling into a subinterval is low, whereby after the distribution most of the buckets will contain either zero or one element; distribute the keys among the buckets by determining directly from each key value the index of the subinterval into which the key value falls; and process the buckets in sequence in order to sort the records, sorting the keys in each bucket if the bucket contains more than one key.
-
-
10. A method for operating a computer having a processor and memory to sort a set of data records stored in the memory each record having an associated key for governing the sorting of the set of data records, the method comprising the steps performed by the computer of:
-
(a) determining a range for the key values by sampling the key values; (b) defining a plurality of buckets, each bucket corresponding to a respective one of a plurality M of subintervals in the range, and two edge buckets for key values outside the range, each subinterval having a respective index, wherein the size and spacing of the subintervals are selected so that the probability of more than one key value falling into a subinterval is low, whereby after the distribution most of the buckets will contain either zero or one element; (c) distributing the keys among the buckets by determining directly from each key value the index of the subinterval into which the key value falls; and (d) processing the buckets in sequence in order to sort the records, sorting the keys in each bucket when the bucket contains more than one key. - View Dependent Claims (11, 12, 13, 14)
-
Specification