×

System and method for querying a distributed dwarf cube

  • US 10,019,472 B2
  • Filed: 08/14/2014
  • Issued: 07/10/2018
  • Est. Priority Date: 08/14/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method for querying a distributed dwarf cube comprising a plurality of dwarf cuboids, wherein the distributed dwarf cube is built using a mapreduce technique, the method comprising:

  • receiving, by a processor, a query for retrieving data from a distributed dwarf cube, wherein the distributed dwarf cube is built of the data, wherein the data comprises cube values, wherein the distributed dwarf cube is built by;

    processing the data, at a first mapreduce job of a series of mapreduce jobs, to generate indexes for the data, wherein the indexes are generated for each dimension in the data, wherein the cube values are replaced with a corresponding index for each dimension of the data;

    sorting the cube values in one or more dimensions based on a cardinality of the cube values and index associated with each cube value, wherein the cube values are sorted in an order of highest cardinality to lowest cardinality at a second mapreduce job of the series of mapreduce jobs, wherein the cardinality indicates distinctiveness of the cube values in the one or more dimensions;

    partitioning the sorted data into data blocks based on a predefined size, wherein each data block is associated with a range, wherein the range corresponds to a start cube value and an end cube value of a highest cardinality dimension in the data block;

    building a distributed dwarf cube, comprising dwarf cuboids, at a third mapreduce job of the series of mapreduce jobs, wherein each dwarf cuboid is generated, from a data block, based on the range associated with the data block by;

    processing the data block using a dwarf algorithm;

    eliminating the dimensions with the highest cardinality from the data;

    processing the data recursively based on the series of mapreduce jobs till all the dimensions in the data block are eliminated; and

    storing the generated cuboid on a Distributed File System;

    querying, by the processor, the distributed dwarf cube, wherein a cluster of query engines is utilized for querying by;

    checking, by the processor, the one or more ranges of the cube values based upon the query, wherein the one or more ranges comprise complete cube values and non-complete cube values, wherein the non-complete cube values indicate the cube values present at a start or an end of a range of the one or more ranges;

    creating, by the processor, a list of the cube values comprising the complete cube values and/or the non-complete cube values; and

    transmitting, by the processor, the list of the cube values from the distributed dwarf cube corresponding to the query.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×