Hash-based database grouping system and method

US 5,511,190 A
Filed: 01/20/1995
Issued: 04/23/1996
Est. Priority Date: 01/20/1995
Status: Expired due to Term

First Claim

Patent Images

1. A method for performing a hash grouping operation on a relational database table, said table comprising rows and columns, said columns including at least group column and zero or more data columns, said method comprising the steps of:

(1) reading a row of said relational database table, said row comprising a identifier from said at least one group column and zero or more data fields to said zero or more data columns;

(2) applying a hash function to said group identifier to generate a hashed group value, said hashed group value serving as an index into a memory-resident hash table, said hash table mapping hashed group values into corresponding memory-resident group table entries, each group table entry including group data fields that store for a single group aggregated raw data from said data fields, a group identifier and housekeeping data;

(3) determining from contents of said hash table whether a matching group table entry exists for a group defined by said group identifier;

(4) when said matching group table entry exists, aggregating contents of said data fields into corresponding group data fields of said group table entry and updating said housekeeping data;

(5) when said matching group table entry does not exist and available memory meets predefined memory availability criteria, allocating an additional group table entry, writing into said additional group table entry said group identifier and said data fields, and initializing said housekeeping data.(6) when said matching group table entry does not exist and available memory does not meet said predefined memory availability criteria, applying an overflow procedure to said row wherein contents of said row are used to update an overflow file.(7) repeating steps (1) through (6) until all rows of said table have been read;

(8) reporting said group table entries;

(9) when at least one of said rows were written to said overflow file, repeating steps (1) through (8), except in step (1) reading said overflow file rather than said relational database table.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A structured query language (SQL) grouping and aggregation system and method that incorporates hash-based techniques, several overflow handling strategies and statistics-based process-selection criteria. The method can execute SQL group-by queries on distributed database tables or tables stored locally to the database management system (DBMS) processor executing the grouping method. Hash-based techniques allow groupings and aggregates to be generated on the fly through the use of partial aggregates maintained in primary memory. Where primary memory is limited, groups and aggregates are still generated for as many groups as can be maintained in primary memory, while various overflow procedures are provided for buffering ungrouped data and writing that data to an overflow disk file for later processing. In one overflow procedure, raw data from groups that cannot be aggregated in primary memory are buffered then written to the overflow disk file. In a second overflow procedure, ungroupable raw data is formatted the same as data being aggregated in the group table, buffered, and then written to the overflow file. In a third overflow procedure, ungroupable raw data is partially aggregated in an output buffer maintained in primary memory before being written to the overflow file maintained in secondary memory. Database table statistics maintained by a cataloger are consulted to determine whether hash-based grouping or conventional sort based grouping should be used to execute a group-by query. The system is adaptable to running a grouping query against a partitioned database on distributed processors.

Citations

8 Claims

1. A method for performing a hash grouping operation on a relational database table, said table comprising rows and columns, said columns including at least group column and zero or more data columns, said method comprising the steps of:
- (1) reading a row of said relational database table, said row comprising a identifier from said at least one group column and zero or more data fields to said zero or more data columns;
  
  (2) applying a hash function to said group identifier to generate a hashed group value, said hashed group value serving as an index into a memory-resident hash table, said hash table mapping hashed group values into corresponding memory-resident group table entries, each group table entry including group data fields that store for a single group aggregated raw data from said data fields, a group identifier and housekeeping data;
  
  (3) determining from contents of said hash table whether a matching group table entry exists for a group defined by said group identifier;
  
  (4) when said matching group table entry exists, aggregating contents of said data fields into corresponding group data fields of said group table entry and updating said housekeeping data;
  
  (5) when said matching group table entry does not exist and available memory meets predefined memory availability criteria, allocating an additional group table entry, writing into said additional group table entry said group identifier and said data fields, and initializing said housekeeping data.(6) when said matching group table entry does not exist and available memory does not meet said predefined memory availability criteria, applying an overflow procedure to said row wherein contents of said row are used to update an overflow file.(7) repeating steps (1) through (6) until all rows of said table have been read;
  
  (8) reporting said group table entries;
  
  (9) when at least one of said rows were written to said overflow file, repeating steps (1) through (8), except in step (1) reading said overflow file rather than said relational database table.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein said overflow procedure includes the step of writing said row to said overflow file.
  - 3. The method of claim 1, wherein said overflow procedure includes the steps of reformatting said row to conform to said group table row entry and writing said reformatted row to an overflow file.
  - 4. The method of claim 1, wherein said overflow procedure includes the steps of:
    - (1) determining from contents of a second hash table whether a matching buffer entry corresponding to said hashed group identifier exists in a group output buffer;
      
      said buffer entries including buffer data fields, a buffer identifier and buffer housekeeping data corresponding to said group data fields, group identifier and group housekeeping data;
      
      (2) when a matching buffer entry exists, aggregating contents of said data fields into said buffer data fields and updating said buffer housekeeping data of said matching buffer entry;
      
      (3) when a matching buffer entry does not exist and said buffer is not yet full, writing according to said group table format a new buffer entry comprising said data fields and said group identifier, initializing said new entry'"'"'s buffer housekeeping data and providing a pointer to said new buffer entry in said second hash table at a position indexed by said hashed group identifier.(4) when a buffer entry corresponding to said hashed group value does not exist in said buffer and said buffer is full, appending contents of said buffer to a disk file, resetting said second hash table, writing according to said group table format a new buffer entry comprising said data fields and said group identifier, initializing said new entry'"'"'s buffer housekeeping data and providing a pointer to said new buffer entry in said second hash table.

5. A database management system (DBMS) including:
- (a) a secondary memory including a relational database table comprising rows and columns, said columns including at least one group column and zero or more data columns, an overflow file and compiled database routines, including a grouping function and a hash function;
  
  (b) a primary memory .including a group table and a hash table, said group table comprising a plurality of group table entries each storing for a single group aggregated member data, a group name and housekeeping data, and said hash table comprising a plurality of hash table entries each storing the address of a corresponding group table entry, said address being retrievable from said hash table at an index generated by applying said hash function to said group name of said corresponding group entry;
  
  (c) a communications interface through which user database queries are relayed to said DBMS from a user workstation, and through which query results from said DBMS are made available to said user workstation;
  
  (d) a processor that controls interactions between said primary memory, said secondary memory and said communications interface, and executes said compiled database routines, including said grouping function in response to a group by query received from said communications interface;
  
  wherein said grouping function comprises;
  
  (i) an input procedure that reads said database table row by row and outputs for each row read a member row comprising a group identifier from said at least one group column and zero or more data fields corresponding to said zero or more data columns;
  
  (ii) a matching procedure that calls said hash function, said hash function generating a hashed group value from said group identifier; and
  
  determines from contents of a hash table entry indexed by said hashed group value whether a matching group table entry exists with said group name identical to said group identifier;
  
  (iii) an aggregation procedure that executes an aggregation function specified by said group by query, said aggregation procedure comprising a set of instructions for updating said matching group table entry with data from said member row;
  
  (iv) an initialization procedure that allocates a new group table entry from said primary memory, initializes said new group table entry based on data from said member row and sets a pointer to said new group table entry in said hash table whenever said matching group table entry does not exist and predetermined memory availability criteria indicate there is sufficient primary memory for a new group table entry;
  
  (v) an overflow procedure that generates an overflow row from said member row whenever said matching group table entry does not exist and said memory availability criteria indicate there is not sufficient primary memory for a new group table entry, said overflow procedure also updating said overflow file based on contents of said overflow row; and
  
  (vi) an overflow input procedure that reads data from said overflow file when all rows of said input table have been read and said overflow procedure has been executed, said overflow file reflecting data from all member rows not aggregated in said group table.
- View Dependent Claims (6, 7, 8)
- - 6. The DBMS of claim 5, said primary memory further comprising an overflow buffer maintained in said primary memory, wherein said overflow row generated by said overflow procedure is formatted identically to said member row, and said updating of said overflow file consists in said overflow procedure flushing said output buffer to said overflow file when said buffer is full and writing said overflow row into said output buffer.
  - 7. The DBMS of claim 5, said primary memory further comprising an output buffer, wherein said overflow row generated by said overflow procedure is formatted identically to said group table entry, and said updating of said overflow file consists in said overflow procedure flushing said output buffer to said overflow file when said buffer is full and writing said overflow row into said output buffer.
  - 8. The DBMS of claim 5, said primary memory further comprising an output buffer and a second hash table, said output buffer comprising a plurality of buffer entries each storing for a single group aggregated member data, a group name and housekeeping data, and said second hash table comprising a plurality of hash table entries each storing the address of a corresponding buffer entry, said address being retrievable from said second hash table at an index generated by applying said hash function to said group name of said corresponding buffer entry;
    - wherein said updating of said overflow file consists in said overflow procedure(a) determining from contents of said second hash table whether a matching buffer entry with said group name identical to said group identifier exists in said output buffer;
      
      (b) when a matching buffer entry exists, aggregating data from said overflow row into said matching buffer entry;
      
      (c) when a matching buffer entry does not exist(i) when said buffer is full, appending contents of said buffer to said overflow file and resetting said second hash table; and
      
      (ii) writing a new buffer entry comprising said data from said overflow row and housekeeping data, initializing said new entry'"'"'s buffer housekeeping data and providing a pointer to said new buffer entry in said second hash table.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hewlett-Packard Development Company, L.P. (HP Inc.)
Original Assignee
Tandem Computers Incorporated (HP Inc.)
Inventors
Sharma, Anoop, Zeller, Hansjorg
Primary Examiner(s)
Kulik, Paul V.

Application Number

US08/376,026
Time in Patent Office

459 Days
Field of Search

395/600
US Class Current

1/1
CPC Class Codes

G06F 16/24556 Aggregation; Duplicate elim...

Y10S 707/99931 Database or file accessing

Hash-based database grouping system and method

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Hash-based database grouping system and method

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links