FAST BULK LOADING AND INCREMENTAL LOADING OF DATA INTO A DATABASE
First Claim
1. A method for loading data into a relational database system, said method comprising:
- receiving a request to load data from a load file into a database;
sampling the load file;
determining a first profile of the load file based on the samples;
determining at least one compression scheme for the data in the load file based on its profile;
compressing the data in the load file based on the at least one compression scheme as the data is loaded into the database;
writing, into a hardware accelerator memory, compressed data that is to be indexed;
determining a second profile of the compressed data that is to be indexed based on a machine code database instruction;
dividing the compressed data that is to be indexed into a set of balanced partitions based on the second profile;
determining a program of machine code database instructions based on the second profile;
distributing the compressed data that is to be indexed in the hardware accelerator memory into partitions in the hardware accelerator memory based on the program of machine code database instructions;
building, in parallel, a sub-index for each partition of the compressed data that is to be indexed, anddetermining an index based on a combination of the sub-indexes.
5 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the present invention provide for batch and incremental loading of data into a database. In the present invention, the loader infrastructure utilizes machine code database instructions and hardware acceleration to parallelize the load operations with the I/O operations. A large, hardware accelerator memory is used as staging cache for the load process. The load process also comprises an index profiling phase that enables balanced partitioning of the created indexes to allow for pipelined load. The online incremental loading process may also be performed while serving queries.
55 Citations
24 Claims
-
1. A method for loading data into a relational database system, said method comprising:
-
receiving a request to load data from a load file into a database; sampling the load file; determining a first profile of the load file based on the samples; determining at least one compression scheme for the data in the load file based on its profile; compressing the data in the load file based on the at least one compression scheme as the data is loaded into the database; writing, into a hardware accelerator memory, compressed data that is to be indexed; determining a second profile of the compressed data that is to be indexed based on a machine code database instruction; dividing the compressed data that is to be indexed into a set of balanced partitions based on the second profile; determining a program of machine code database instructions based on the second profile; distributing the compressed data that is to be indexed in the hardware accelerator memory into partitions in the hardware accelerator memory based on the program of machine code database instructions; building, in parallel, a sub-index for each partition of the compressed data that is to be indexed, and determining an index based on a combination of the sub-indexes. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of loading data in existing tables of a database, said method comprising:
-
retrieving a portion of the data to be loaded; identifying a compression scheme used to store data in the database; determining whether the identified compression scheme is optimum based on the data to be loaded; loading the data into the database based on the identified compression scheme when the compression is optimum; determining a new compression scheme when the identified scheme is not optimum; reorganizing a portion of the database based on the new compression scheme; and loading the data into the database based on the new compression scheme when the identified scheme was not optimum. - View Dependent Claims (8, 9, 10)
-
-
11. A method of deleting data from existing tables of a database, said method comprising:
-
determining a portion of the data to be deleted from the database; deleting the portion of the data from the database using a hardware accelerator and based on a program of machine code database instructions; determining columns of data that are indexed and affected by the deletion of the portion of the data; and updating, in parallel, indexes for the indexed columns of data using the hardware accelerator and based on another program of machine code database instructions - View Dependent Claims (12, 13, 14, 15)
-
-
16. A method of updating data in existing tables of a database, said method comprising:
-
determining a portion of the data to be updated in the database; updating the portion of the data from the database using a hardware accelerator and based on a program of machine code database instructions; determining columns of data that are indexed and affected by the update of the portion of the data; and updating, in parallel, indexes for the indexed columns of data using the hardware accelerator and based on another program of machine code database instructions - View Dependent Claims (17, 18, 19, 20)
-
-
21. A method of writing data to a memory coupled to a database hardware accelerator based on scattered writes/reads, said method comprising:
-
gathering scattered index data into the memory; passing collected index data back to a host processor; receiving an updated index based on the collected index data; distributing the collected index data into the memory based on the updated index data using a machine code database instruction; and updating storage with the updated index data. - View Dependent Claims (22, 23, 24)
-
Specification