Method for rapidly and efficiently hashing records of large databases
First Claim
1. A method for rapidly hashing records in a database stored on a secondary storage device, the size of said database exceeding that of a primary storage in which operations on said database are to be performed, comprising the steps of:
- A. providing in primary storage a set of memory-blocks for receiving hash records therein, each said memory-block being associated with a sub-range of hash values that collectively span a range of hash values encompassing said database;
B. repeatedly retrieving from the secondary storage device groups of records and generating a hash value for each record;
C. associating each hash value with at least secondary storage address information for the respective record so as to form retrieval information comprising intermediate hash records that characterize the retrieved records;
D. distributing said retrieval information among said memory-blocks in accordance with the range of hash values associated with the respective records;
E. as each memory-block fills, writing the corresponding intermediate hash records to an intermediate file associated with the memory block in secondary storage to enable further intermediate hash records to be distributed to said memory-block;
F. retrieving said intermediate files from secondary storage and ordering the intermediate hash records therein so as to form hashed files; and
G. writing the hashed files to secondary storage as a single composite file to form a hash table spanning the entire database.
5 Assignments
0 Petitions
Accused Products
Abstract
A method for rapidly hashing records in a large database stored on a secondary storage device in which a set of memory-blocks are preferably established in main memory for receiving information. Each memory block is associated with a sub-range of hash values that collectively span a range of hash values derived from one or more fields of the database records. The hash values together with other information are distributed among the memory-blocks in accordance with the range of hash values. As each memory block fills, its contents are written to an intermediate file associated with the memory-block in secondary storage. The intermediate files are subsequently retrieved and the hash values stored therein are ordered. The ordered intermediate files are then written to secondary storage as a single hash table spanning the entire database.
101 Citations
9 Claims
-
1. A method for rapidly hashing records in a database stored on a secondary storage device, the size of said database exceeding that of a primary storage in which operations on said database are to be performed, comprising the steps of:
-
A. providing in primary storage a set of memory-blocks for receiving hash records therein, each said memory-block being associated with a sub-range of hash values that collectively span a range of hash values encompassing said database; B. repeatedly retrieving from the secondary storage device groups of records and generating a hash value for each record; C. associating each hash value with at least secondary storage address information for the respective record so as to form retrieval information comprising intermediate hash records that characterize the retrieved records; D. distributing said retrieval information among said memory-blocks in accordance with the range of hash values associated with the respective records; E. as each memory-block fills, writing the corresponding intermediate hash records to an intermediate file associated with the memory block in secondary storage to enable further intermediate hash records to be distributed to said memory-block; F. retrieving said intermediate files from secondary storage and ordering the intermediate hash records therein so as to form hashed files; and G. writing the hashed files to secondary storage as a single composite file to form a hash table spanning the entire database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
Specification