Method of performing a high-performance sort which gains efficiency by reading input file blocks sequentially
First Claim
1. An improvement in a sort-merge process carried out upon an input file stored on an input disk in the form of a direct access storage device which is made available to a digital computer, said input file to be sorted and stored in sorted form in an output file, said process comprising a pre-string generation phase, a string generation phase and a merge phase, wherein the improvement comprises the following steps conducted within said string generation phase:
- a) reading the directory data prior to reading the data of said input file, for said input file from said input disk and determining therefrom the physically contiguous runs of data on said input disk associated with said input file and information about the location of said runs on said input disk;
b) sorting said location information with regard to all of said runs and thereby determining the physical order of said runs on said input disk;
c) sequentially reading into random access memory of said computer in accordance with said determined physical order, blocks of data comprised of input from said runs; and
d) sorting each such block of data and writing it to sort work if necessary, or if not necessary, to the output file.
23 Assignments
0 Petitions
Accused Products
Abstract
An improved method of performing a sort-merge operation on a digital computer is disclosed, which gains efficiency by reading input file blocks sequentially. The method takes into consideration the fact that records can be read in any order if they are subsequently to be sorted. Input from disk is processed by reading the working disk directory maintained by the operating system to determine all of the blocks associated with the input data to be sorted. The data block identities so determined are sorted in accordance with their physical location on the disk, thereby providing a sequential order for reading. The input data is read in this sequential order, and then, using largely conventional methods, sorted into one or more strings and merged as necessary to form the fully sorted output. Since the original record order in the file is known from the working directory that has been read, that order can be utilized if and as necessary, for example to preserve the original order of records with equal keys.
58 Citations
4 Claims
-
1. An improvement in a sort-merge process carried out upon an input file stored on an input disk in the form of a direct access storage device which is made available to a digital computer, said input file to be sorted and stored in sorted form in an output file, said process comprising a pre-string generation phase, a string generation phase and a merge phase, wherein the improvement comprises the following steps conducted within said string generation phase:
-
a) reading the directory data prior to reading the data of said input file, for said input file from said input disk and determining therefrom the physically contiguous runs of data on said input disk associated with said input file and information about the location of said runs on said input disk;
b) sorting said location information with regard to all of said runs and thereby determining the physical order of said runs on said input disk;
c) sequentially reading into random access memory of said computer in accordance with said determined physical order, blocks of data comprised of input from said runs; and
d) sorting each such block of data and writing it to sort work if necessary, or if not necessary, to the output file. - View Dependent Claims (2, 3, 4)
a) saving said directory data;
b) determining therefrom the original order of records in the input file;
c) using said original order to order records in the output file having equal sort keys.
-
-
3. An article of manufacture comprising a data storage medium on which there has been recorded a computer program which when executed on a suitable computer system performs the methods of claims 1 or 2.
-
4. A system comprising a general purpose digital computer in which there has been loaded into the appropriate memory and instruction storage areas a computer program which when executed performs the methods of claims 1 or 2.
Specification