Method for rapidly and efficiently hashing records of large databases

US 5,809,494 A
Filed: 11/16/1995
Issued: 09/15/1998
Est. Priority Date: 11/16/1995
Status: Expired due to Term

First Claim

Patent Images

1. A method for rapidly hashing records in a database stored on a secondary storage device, the size of said database exceeding that of a primary storage in which operations on said database are to be performed, comprising the steps of:

A. providing in primary storage a set of memory-blocks for receiving hash records therein, each said memory-block being associated with a sub-range of hash values that collectively span a range of hash values encompassing said database;

B. repeatedly retrieving from the secondary storage device groups of records and generating a hash value for each record;

C. associating each hash value with at least secondary storage address information for the respective record so as to form retrieval information comprising intermediate hash records that characterize the retrieved records;

D. distributing said retrieval information among said memory-blocks in accordance with the range of hash values associated with the respective records;

E. as each memory-block fills, writing the corresponding intermediate hash records to an intermediate file associated with the memory block in secondary storage to enable further intermediate hash records to be distributed to said memory-block;

F. retrieving said intermediate files from secondary storage and ordering the intermediate hash records therein so as to form hashed files; and

G. writing the hashed files to secondary storage as a single composite file to form a hash table spanning the entire database.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for rapidly hashing records in a large database stored on a secondary storage device in which a set of memory-blocks are preferably established in main memory for receiving information. Each memory block is associated with a sub-range of hash values that collectively span a range of hash values derived from one or more fields of the database records. The hash values together with other information are distributed among the memory-blocks in accordance with the range of hash values. As each memory block fills, its contents are written to an intermediate file associated with the memory-block in secondary storage. The intermediate files are subsequently retrieved and the hash values stored therein are ordered. The ordered intermediate files are then written to secondary storage as a single hash table spanning the entire database.

101 Citations

View as Search Results

9 Claims

1. A method for rapidly hashing records in a database stored on a secondary storage device, the size of said database exceeding that of a primary storage in which operations on said database are to be performed, comprising the steps of:
- A. providing in primary storage a set of memory-blocks for receiving hash records therein, each said memory-block being associated with a sub-range of hash values that collectively span a range of hash values encompassing said database;
  
  B. repeatedly retrieving from the secondary storage device groups of records and generating a hash value for each record;
  
  C. associating each hash value with at least secondary storage address information for the respective record so as to form retrieval information comprising intermediate hash records that characterize the retrieved records;
  
  D. distributing said retrieval information among said memory-blocks in accordance with the range of hash values associated with the respective records;
  
  E. as each memory-block fills, writing the corresponding intermediate hash records to an intermediate file associated with the memory block in secondary storage to enable further intermediate hash records to be distributed to said memory-block;
  
  F. retrieving said intermediate files from secondary storage and ordering the intermediate hash records therein so as to form hashed files; and
  
  G. writing the hashed files to secondary storage as a single composite file to form a hash table spanning the entire database.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 in which each said memory-block is associated with a distinct sub-range of locations in which the records of said database are to be stored.
  - 3. The method of claim 2 in which each said memory-block is associated with a subrange of less than 10% of the locations in which the records of said database are to be stored.
  - 4. The method of claim 2 in which each said memory-block is associated with a subrange of less than 2% of the locations in which the records of said database are to be stored.
  - 5. The method of claim 2 in which each memory block occupies less than 10% of the available primary memory user space.
  - 6. The method of claim 2 in which each memory block occupies less than 2% of the available primary memory user space.
  - 7. The method of claim 1 in which each said memory block, on filling, is appended to an intermediate file corresponding to said block to thereby form an extended file for amalgamating the successive contents of the corresponding memory-block.
  - 8. The method of claim 7 in which said records are hashed by hash number.
  - 9. The method of claim 8 which includes the step of generating at least a second hash value for each record, said second hash value being used to define the location of the hash record within the memory-block to thereby specify the location of said hash records within said block.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SpeechWorks International, Inc. (Microsoft Corporation)
Original Assignee
Applied Language Technologies, Inc.
Inventors
Nguyen, John N.
Primary Examiner(s)
Black, Thomas G.
Assistant Examiner(s)
Corrielus, Jean M.

Application Number

US08/559,532
Time in Patent Office

1,034 Days
Field of Search

395/600, 395/601
US Class Current

1/1
CPC Class Codes

G06F 16/9014 hash tables

Y10S 707/99931 Database or file accessing

Method for rapidly and efficiently hashing records of large databases

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

101 Citations

9 Claims

Specification

Use Cases

Quick Links

Others

Method for rapidly and efficiently hashing records of large databases

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

101 Citations

9 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others