OLAP query processing method oriented to database and HADOOP hybrid platform
First Claim
1. An on-line analytical processing (OLAP) query processing method oriented to a database and Hadoop hybrid platform, based on an inverse star-schema storage structure with dimension tables stored in a central node in a centralized way, and fact tables distributed on working nodes according to a data distribution policy of a Hadoop distributed file system, wherein:
- storing a fact table in a database cluster based on a multi-copy fault-tolerance mechanism of the Hadoop distributed file system;
setting a main working copy and at least one fault-tolerant copy of the fact table;
importing the main working copy into a local database of a working node;
naming a table corresponding to the main working copy according to a unified naming rule;
deleting the main working copy in the Hadoop distributed file system;
updating meta-information of the main working copy in a namenode into a JDBC connection of the local database and a name of the table corresponding to the main working copy;
executing OLAP query processing first on the main working copy through a DDTA-JOIN in which a predicate bitmap vector is used as a query filter to complete multi-table join, a fact table record foreign key value is mapped to a subscript of a corresponding dimension table predicate bitmap vector, a flag bit of each dimension table predicate bitmap vector is extracted to perform a bit operation, and a group-by attribute data item is extracted according to a dimensional attribute array subscript mapped to by the fact table foreign key value to perform a hash group-by aggregate processing when a result of the bit operation is true, and recording a query processing result in an aggregate result table of the local database;
completing the OLAP query processing on some datasets, searching the namenode for a storage node of the at least one fault-tolerant copy corresponding to the main working copy in the working node according to a number of the faulty nodes, and invoking a MapReduce task to complete the OLAP query processing on the at least one fault-tolerant copy by the database cluster, when the working node is faulty during a procedure of the OLAP query processing;
merging an OLAP query processing result of the database cluster and an OLAP query processing result of the MapReduce task; and
returning a merged OLAP query processing result.
1 Assignment
0 Petitions
Accused Products
Abstract
An OLAP query processing method oriented to a database and Hadoop hybrid platform is described. When OLAP query processing is performed, the processing is executed first on a main working copy, and a query processing result is recorded in an aggregate result table of a local database; when a working node is faulty, node information of a fault-tolerant copy corresponding to the main working copy is searched for through namenode, and a MapReduce task is invoked to complete the OLAP query processing task on the fault-tolerant copy. The database technology and the Hadoop technology are combined, and the storage performance of the database and the high expandability and high availability of the Hadoop are combined; the database query processing and the MapReduce query processing are integrated in a loosely-coupled mode, thereby ensuring the high query processing performance, and ensuring the high fault-tolerance performance.
-
Citations
7 Claims
-
1. An on-line analytical processing (OLAP) query processing method oriented to a database and Hadoop hybrid platform, based on an inverse star-schema storage structure with dimension tables stored in a central node in a centralized way, and fact tables distributed on working nodes according to a data distribution policy of a Hadoop distributed file system, wherein:
-
storing a fact table in a database cluster based on a multi-copy fault-tolerance mechanism of the Hadoop distributed file system; setting a main working copy and at least one fault-tolerant copy of the fact table; importing the main working copy into a local database of a working node; naming a table corresponding to the main working copy according to a unified naming rule; deleting the main working copy in the Hadoop distributed file system; updating meta-information of the main working copy in a namenode into a JDBC connection of the local database and a name of the table corresponding to the main working copy; executing OLAP query processing first on the main working copy through a DDTA-JOIN in which a predicate bitmap vector is used as a query filter to complete multi-table join, a fact table record foreign key value is mapped to a subscript of a corresponding dimension table predicate bitmap vector, a flag bit of each dimension table predicate bitmap vector is extracted to perform a bit operation, and a group-by attribute data item is extracted according to a dimensional attribute array subscript mapped to by the fact table foreign key value to perform a hash group-by aggregate processing when a result of the bit operation is true, and recording a query processing result in an aggregate result table of the local database; completing the OLAP query processing on some datasets, searching the namenode for a storage node of the at least one fault-tolerant copy corresponding to the main working copy in the working node according to a number of the faulty nodes, and invoking a MapReduce task to complete the OLAP query processing on the at least one fault-tolerant copy by the database cluster, when the working node is faulty during a procedure of the OLAP query processing; merging an OLAP query processing result of the database cluster and an OLAP query processing result of the MapReduce task; and returning a merged OLAP query processing result. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
Specification