OLAP query processing method oriented to database and HADOOP hybrid platform

US 9,501,550 B2
Filed: 05/16/2012
Issued: 11/22/2016
Est. Priority Date: 04/18/2012
Status: Active Grant

First Claim

Patent Images

1. An on-line analytical processing (OLAP) query processing method oriented to a database and Hadoop hybrid platform, based on an inverse star-schema storage structure with dimension tables stored in a central node in a centralized way, and fact tables distributed on working nodes according to a data distribution policy of a Hadoop distributed file system, wherein:

storing a fact table in a database cluster based on a multi-copy fault-tolerance mechanism of the Hadoop distributed file system;

setting a main working copy and at least one fault-tolerant copy of the fact table;

importing the main working copy into a local database of a working node;

naming a table corresponding to the main working copy according to a unified naming rule;

deleting the main working copy in the Hadoop distributed file system;

updating meta-information of the main working copy in a namenode into a JDBC connection of the local database and a name of the table corresponding to the main working copy;

executing OLAP query processing first on the main working copy through a DDTA-JOIN in which a predicate bitmap vector is used as a query filter to complete multi-table join, a fact table record foreign key value is mapped to a subscript of a corresponding dimension table predicate bitmap vector, a flag bit of each dimension table predicate bitmap vector is extracted to perform a bit operation, and a group-by attribute data item is extracted according to a dimensional attribute array subscript mapped to by the fact table foreign key value to perform a hash group-by aggregate processing when a result of the bit operation is true, and recording a query processing result in an aggregate result table of the local database;

completing the OLAP query processing on some datasets, searching the namenode for a storage node of the at least one fault-tolerant copy corresponding to the main working copy in the working node according to a number of the faulty nodes, and invoking a MapReduce task to complete the OLAP query processing on the at least one fault-tolerant copy by the database cluster, when the working node is faulty during a procedure of the OLAP query processing;

merging an OLAP query processing result of the database cluster and an OLAP query processing result of the MapReduce task; and

returning a merged OLAP query processing result.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An OLAP query processing method oriented to a database and Hadoop hybrid platform is described. When OLAP query processing is performed, the processing is executed first on a main working copy, and a query processing result is recorded in an aggregate result table of a local database; when a working node is faulty, node information of a fault-tolerant copy corresponding to the main working copy is searched for through namenode, and a MapReduce task is invoked to complete the OLAP query processing task on the fault-tolerant copy. The database technology and the Hadoop technology are combined, and the storage performance of the database and the high expandability and high availability of the Hadoop are combined; the database query processing and the MapReduce query processing are integrated in a loosely-coupled mode, thereby ensuring the high query processing performance, and ensuring the high fault-tolerance performance.

Citations

7 Claims

1. An on-line analytical processing (OLAP) query processing method oriented to a database and Hadoop hybrid platform, based on an inverse star-schema storage structure with dimension tables stored in a central node in a centralized way, and fact tables distributed on working nodes according to a data distribution policy of a Hadoop distributed file system, wherein:
- storing a fact table in a database cluster based on a multi-copy fault-tolerance mechanism of the Hadoop distributed file system;
  
  setting a main working copy and at least one fault-tolerant copy of the fact table;
  
  importing the main working copy into a local database of a working node;
  
  naming a table corresponding to the main working copy according to a unified naming rule;
  
  deleting the main working copy in the Hadoop distributed file system;
  
  updating meta-information of the main working copy in a namenode into a JDBC connection of the local database and a name of the table corresponding to the main working copy;
  
  executing OLAP query processing first on the main working copy through a DDTA-JOIN in which a predicate bitmap vector is used as a query filter to complete multi-table join, a fact table record foreign key value is mapped to a subscript of a corresponding dimension table predicate bitmap vector, a flag bit of each dimension table predicate bitmap vector is extracted to perform a bit operation, and a group-by attribute data item is extracted according to a dimensional attribute array subscript mapped to by the fact table foreign key value to perform a hash group-by aggregate processing when a result of the bit operation is true, and recording a query processing result in an aggregate result table of the local database;
  
  completing the OLAP query processing on some datasets, searching the namenode for a storage node of the at least one fault-tolerant copy corresponding to the main working copy in the working node according to a number of the faulty nodes, and invoking a MapReduce task to complete the OLAP query processing on the at least one fault-tolerant copy by the database cluster, when the working node is faulty during a procedure of the OLAP query processing;
  
  merging an OLAP query processing result of the database cluster and an OLAP query processing result of the MapReduce task; and
  
  returning a merged OLAP query processing result.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The OLAP query processing method oriented to a database and Hadoop hybrid platform according to claim 1, wherein:
    - the OLAP query processing method is applied in a hybrid platform having a dual-storage engine and a dual-OLAP query processing engine, the dual-storage engine comprises a database storage engine and a Hadoop storage engine, and the dual-OLAP query processing engine comprises a database OLAP engine and a MapReduce query processing engine.
  - 3. The OLAP query processing method oriented to a database and Hadoop hybrid platform according to claim 1, wherein:
    - a parallel OLAP query processing technology with DDTA-JOIN is adopted for the main working copy of the local database; and
      
      a MapReduce query processing technology is adopted for the at least one fault-tolerant copy in the Hadoop distributed file system.
  - 4. The OLAP query processing method oriented to a database and Hadoop hybrid platform according to claim 1, wherein:
    - when the OLAP query results of the working nodes are reduced, if no node is faulty during a procedure of inserting local aggregate results to reduce nodes, after all aggregate result records are inserted to the reduce nodes, a group-by operation is executed to perform global aggregate; and
      
      aggregate result records in the reduce nodes are loaded into a designated reduce node to perform a final aggregate result reduce operation, and an OLAP query processing result is returned.
  - 5. The OLAP query processing method oriented to a database and Hadoop hybrid platform according to claim 1, wherein:
    - when the OLAP query results of the working nodes are reduced, if a node is faulty during a procedure of inserting local aggregate results to reduce nodes, after the procedure of inserting local aggregate results to reduce nodes is completed, the aggregate result record inserted by the number of faulty nodes is filtered out when a group-by operation is executed according to the number of the faulty nodes, so as to obtain an aggregate reduce result of a part of the working nodes; and
      
      after the number of faulty nodes are restored or the OLAP query processing corresponding to a faulty node copy is redone by another fault-tolerant node, the aggregate result record generated on the faulty node copy influenced by the faulty node is inserted into an aggregate result table of the reduce node, and global merge is performed on aggregate result subsets of the OLAP query processing by executing the group-by operation again.
  - 6. The OLAP query processing method oriented to a database and Hadoop hybrid platform according to claim 1, wherein:
    - when many concurrent queries and few group-by attribute updates of the dimension table exist, a query filter required by a join operation is minimized into a bitmap, and a dimension table group-by attribute required by the group-by operation is cached in a cache of the working node.
  - 7. The OLAP query processing method oriented to a database and Hadoop hybrid platform according to claim 1, wherein:
    - an on-line transaction processing (OLTP) system is deployed on the central node, fact table data in the OLTP system is periodically loaded into a Hadoop OLAP system, and the OLTP system stores real-time updated data;
      
      when executing the OLAP query processing, the Hadoop OLAP system first generates a predicate vector in the OLTP system, broadcasts the predicate vector to a Hadoop database cluster through a primary-secondary cache to perform parallel OLAP processing, at the same time, performs the OLAP processing on latest data of the OLTP system in a centralized way, reduces OLAP aggregate calculation results of the OLTP system and the Hadoop OLAP system, and returns to a user an OLAP query processing result on the real-time updated data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Renmin University of China
Original Assignee
Renmin University of China
Inventors
Zhang, Yan-Song, Wang, Shan
Primary Examiner(s)
Cao, Phuong Thao
Assistant Examiner(s)
Mahmood, Rezwanul

Application Number

US13/514,296
Publication Number

US 20130282650A1
Time in Patent Office

1,651 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/283 Multi-dimensional databases...

OLAP query processing method oriented to database and HADOOP hybrid platform

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

OLAP query processing method oriented to database and HADOOP hybrid platform

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links