×

OLAP query processing method oriented to database and HADOOP hybrid platform

  • US 9,501,550 B2
  • Filed: 05/16/2012
  • Issued: 11/22/2016
  • Est. Priority Date: 04/18/2012
  • Status: Active Grant
First Claim
Patent Images

1. An on-line analytical processing (OLAP) query processing method oriented to a database and Hadoop hybrid platform, based on an inverse star-schema storage structure with dimension tables stored in a central node in a centralized way, and fact tables distributed on working nodes according to a data distribution policy of a Hadoop distributed file system, wherein:

  • storing a fact table in a database cluster based on a multi-copy fault-tolerance mechanism of the Hadoop distributed file system;

    setting a main working copy and at least one fault-tolerant copy of the fact table;

    importing the main working copy into a local database of a working node;

    naming a table corresponding to the main working copy according to a unified naming rule;

    deleting the main working copy in the Hadoop distributed file system;

    updating meta-information of the main working copy in a namenode into a JDBC connection of the local database and a name of the table corresponding to the main working copy;

    executing OLAP query processing first on the main working copy through a DDTA-JOIN in which a predicate bitmap vector is used as a query filter to complete multi-table join, a fact table record foreign key value is mapped to a subscript of a corresponding dimension table predicate bitmap vector, a flag bit of each dimension table predicate bitmap vector is extracted to perform a bit operation, and a group-by attribute data item is extracted according to a dimensional attribute array subscript mapped to by the fact table foreign key value to perform a hash group-by aggregate processing when a result of the bit operation is true, and recording a query processing result in an aggregate result table of the local database;

    completing the OLAP query processing on some datasets, searching the namenode for a storage node of the at least one fault-tolerant copy corresponding to the main working copy in the working node according to a number of the faulty nodes, and invoking a MapReduce task to complete the OLAP query processing on the at least one fault-tolerant copy by the database cluster, when the working node is faulty during a procedure of the OLAP query processing;

    merging an OLAP query processing result of the database cluster and an OLAP query processing result of the MapReduce task; and

    returning a merged OLAP query processing result.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×