Compression-aware partial sort of streaming columnar data
First Claim
1. A computer-implemented method of sorting data records comprising:
- generating a plurality of data structures, each of which is associated with a different corresponding record field used to sort the data records, and inserting values of the record fields into the corresponding data structures, the values of the record fields being received in a plurality of streams, each of the plurality of streams corresponding to a different respective record field and including a sequence of chunks, each of the chunks including values of a record field corresponding to the stream including the chunk;
inserting values of the record fields into corresponding data structures, the values being inserted such that the values are in an order in the corresponding data structures, the inserting comprising;
for each respective value of each chunk of each stream of the plurality of streams, performing;
receiving a respective value and an instruction for a respective entry of a respective stream,updating the data structure for the respective stream according to the respective value and the instruction, andgenerating an instruction for a corresponding entry of a next stream; and
emitting a top predetermined number of the sorted data records, the emitting including reading out the inserted values stored in the plurality of data structures, wherein;
each of the data structures comprises one or more ordered parts;
each inserted value is inserted into a corresponding ordered part of the corresponding data structure, the corresponding ordered part further including a count of occurrences of the value; and
each ordered part of a data structure corresponding to a record field having a sort priority immediately below another record field corresponds to a distinct value inserted into an ordered part of the data structure corresponding to the another record field.
1 Assignment
0 Petitions
Accused Products
Abstract
According to one embodiment of the present invention, a system for sorting data records generates a plurality of data structures associated with corresponding record fields used to sort the data records, and inserts values of the record fields into the corresponding data structures. Each of the data structures comprises one or more ordered parts, and each inserted value is inserted into a part of the corresponding data structure. Each part of a data structure corresponding to a record field having a sort priority immediately below another record field corresponds to a distinct value inserted into a part of the data structure corresponding to the other record field. The system processes the generated data structures to determine sorted data records. Embodiments of the present invention further include a method and computer program product for sorting data records in substantially the same manners described above.
-
Citations
7 Claims
-
1. A computer-implemented method of sorting data records comprising:
-
generating a plurality of data structures, each of which is associated with a different corresponding record field used to sort the data records, and inserting values of the record fields into the corresponding data structures, the values of the record fields being received in a plurality of streams, each of the plurality of streams corresponding to a different respective record field and including a sequence of chunks, each of the chunks including values of a record field corresponding to the stream including the chunk; inserting values of the record fields into corresponding data structures, the values being inserted such that the values are in an order in the corresponding data structures, the inserting comprising; for each respective value of each chunk of each stream of the plurality of streams, performing; receiving a respective value and an instruction for a respective entry of a respective stream, updating the data structure for the respective stream according to the respective value and the instruction, and generating an instruction for a corresponding entry of a next stream; and emitting a top predetermined number of the sorted data records, the emitting including reading out the inserted values stored in the plurality of data structures, wherein; each of the data structures comprises one or more ordered parts; each inserted value is inserted into a corresponding ordered part of the corresponding data structure, the corresponding ordered part further including a count of occurrences of the value; and each ordered part of a data structure corresponding to a record field having a sort priority immediately below another record field corresponds to a distinct value inserted into an ordered part of the data structure corresponding to the another record field. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
Specification