Run formation in large scale sorting using batched replacement selection
First Claim
1. A method for forming an output run of data records using a selection tree, the method comprising:
- reading multiple data records into a memory from an input stream;
forming mini-runs from the multiple data records;
adding the mini-runs as leaf nodes of the selection tree;
selecting multiple records for output from the mini-runs;
collecting the selected records into a batch of selected records, the batch having a predefined size; and
outputting the batch of selected records, wherein the selecting, collecting, and outputting steps are performed together as a batch output routine, and wherein the method further includes switching to an input routine following the outputting of the batch of selected records, further wherein the selection tree has a dynamically varying size.
3 Assignments
0 Petitions
Accused Products
Abstract
A large-scale sorting process utilizes a batched replacement selection method to form runs of sorted data records. The batched replacement selection method involves reading multiple records from a persistent data storage into main memory and sorting the multiple records to form a mini-run of multiple sorted data records. After formation, the mini-run is added to a selection tree by inserting a pointer to a first record in the mini-run into the array of pointers. The first record is linked to remaining records in the mini-run. As records are selected for output from the selection tree, the methodology replaces the selected record with a next record in the associated mini-run (if not empty) or alternatively deletes the node if the mini-run is empty. The selected records are collected into an output buffer. When the number of records reaches a pre-determined number, the selected records are written in batch back to the persistent data storage.
-
Citations
44 Claims
-
1. A method for forming an output run of data records using a selection tree, the method comprising:
-
reading multiple data records into a memory from an input stream;
forming mini-runs from the multiple data records;
adding the mini-runs as leaf nodes of the selection tree;
selecting multiple records for output from the mini-runs;
collecting the selected records into a batch of selected records, the batch having a predefined size; and
outputting the batch of selected records, wherein the selecting, collecting, and outputting steps are performed together as a batch output routine, and wherein the method further includes switching to an input routine following the outputting of the batch of selected records, further wherein the selection tree has a dynamically varying size. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
selecting a particular record for output from a particular mini-run;
in an event that the particular mini-run is not empty, replacing the selected particular record with a next record from the particular mini-run; and
in an event that the particular mini-run is empty, deleting the leaf node associated with the particular mini-run.
-
-
5. One or more storage media storing computer-executable instructions for performing the method as recited in claim 1.
-
6. A method as recited in claim 1, further comprising the step of determining whether memory is full, wherein the batch output routine is initiated in response to a determination that the memory is full.
-
7. A method as recited in claim 1, wherein the batch of predefined size comprises a batch having a predetermined number of records.
-
8. A method as recited in claim 1, wherein the input routine includes the steps of reading, forming, and adding.
-
9. A method, comprising:
-
forming a selection tree in memory as an array of pointers to records to be sorted, the selection tree having external leaf nodes and internal nodes;
associating a set of multiple sorted data records with each of the external leaf nodes;
using key values of first records in the sets of sorted data records to define the internal nodes of the selection tree;
selecting a record for output from a particular external leaf node;
in an event that the set of multiple data records associated with the particular external leaf node is not empty, replacing the selected record with a next record from the set of multiple data records;
in an event that the particular set of multiple data records associated with the particular external leaf node is empty, deleting the particular external leaf node from the selection tree; and
collecting multiple selected records in a batch of predefined size, and outputting the selected records in batch, wherein the selecting, collecting, and outputting steps are performed together as a batch output routine, and wherein the method further includes switching to an input routine following the outputting of the batch of selected records. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A method for constructing a selection tree used in replacement selection processes to form runs of data records, comprising:
-
associating a set of multiple sorted data records with each external leaf node of the selection tree stored in memory, each data record having a corresponding run number and key value;
using the key values of first records in the sets of sorted data records to define the internal nodes of the selection tree;
selecting multiple records for output from the sets of multiple data;
collecting the selected records into a batch of predefined size; and
outputting the batch of selected records, wherein the selecting, collecting, and outputting steps are performed together as a batch output routine, and wherein the method further includes switching to an input routine following the outputting of the batch of selected records. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. In a database system having cache memory, main memory, and persistent data storage, a method for forming a run of sorted data records using a selection tree, comprising:
-
reading multiple records in batch from the persistent data storage into the main memory;
sorting the multiple records in the main memory to form a mini-run of sorted data records;
adding the mini-run as an external node on the selection tree, wherein a first record in the mini-run is represented by the external node;
selecting a record for output from a particular external node;
in an event that the mini-run at the particular external node is not empty, replacing the selected record with a next record from the mini-run;
in an event that the particular mini-run at the particular external node is empty, deleting the external node;
collecting the selected records in a batch of predefined size; and
outputting the selected records in batch, wherein the selecting, collecting, and outputting steps are performed together as a batch output routine, and wherein the method further includes switching to an input routine following the outputting of the batch of selected records. - View Dependent Claims (21, 22, 23, 24, 25)
-
-
26. A replacement selection method used in large scale sorting for forming runs of sorted data records, an improvement comprising forming external leaf nodes of a selection tree as mini-runs of multiple sorted data records,
further improved by batch processing of input and output records, wherein the batch processing of input records is performed in a batch input routine, and the batch processing of output records is performed in a batch output routine, wherein the batch input routine alternates with the batch output routine.
-
27. A system comprising:
-
a memory subsystem having a persistent data storage and a main memory;
a processing unit having a processor and cache memory;
a batched replacement selection module executed by the processing unit for forming runs of sorted data records, the batched replacement selection module forming a selection tree as an array of pointers to records to be sorted, the array being stored in the main memory; and
the batched replacement selection module reading multiple records in batch from the persistent data storage into the main memory and sorting the multiple records in the main memory to form a mini-run of sorted data records, the batched replacement selection module adding the mini-run as an external node on the selection tree, wherein the batched replacement selection module selects multiple records for output, collects the selected records in a batch of predefined size, and writes the selected records in batch back to the persistent data storage, wherein the selecting, collecting, and writing are performed together as a batch output routine, and wherein the method further includes switching to an input routine following the outputting of the batch of selected records. - View Dependent Claims (28, 29, 30, 31, 32)
-
-
33. A system comprising:
-
a memory subsystem having a persistent data storage and a main memory;
a processing unit having a processor and cache memory;
an array of pointers to records to be sorted arranged to form a selection tree having internal and external nodes, the array being stored in the main memory;
a mini-run assembly data structure stored in the main memory; and
a batched replacement selection module executable on the processing unit to read multiple records from the persistent data storage into the mini-run assembly data structure and sort the multiple records to form a mini-run of sorted data records, the batched replacement selection module inserting a pointer to a first record in the mini-run into the array of pointers to add a node to the selection tree, the first record being linked to remaining records in the mini-run, wherein the batched replacement selection module selects a record for output, collects the selected records in a batch of predefined size, and writes the selected records in batch back to the persistent data storage, wherein the selecting, collecting, and writing are performed together as a batch output routine, and wherein the method further includes switching to an input routine following the outputting of the batch of selected records. - View Dependent Claims (34, 35, 36, 37, 38)
-
-
39. A program module stored on a computer readable medium that, when executed by a processor, performs:
-
reading multiple records into a data structure in memory;
sorting the multiple records to form a mini-run of sorted data records;
adding the mini-run as an external node on a selection tree;
selecting multiple records for output from the mini-runs;
collecting the selected records in a batch of predefined size; and
outputting the selected records in batch, wherein the selecting, collecting, and outputting steps are performed together as a batch output routine, and wherein the method further includes switching to an input routine following the outputting of the batch of selected records. - View Dependent Claims (40, 41, 42, 43, 44)
selecting a particular record for output from a particular external node of the selection tree; and
in an event that the mini-run at the particular external node is not empty, replacing the selected particular record with a next record from the mini-run.
-
-
41. A program module as recited in claim 39, further performing:
-
selecting a particular record for output from a particular external node of the selection tree; and
in an event that the particular mini-run at the particular external node is empty, deleting the external node.
-
-
42. A program module as recited in claim 39, wherein the batch output routine is initiated in response to a determination that the memory is full.
-
43. A program module as recited in claim 39, wherein the batch of predefined size comprises a batch having a predetermined number of records.
-
44. A program module as recited in claim 39, further wherein the selection tree has a dynamically varying size.
Specification