System and method for querying a distributed dwarf cube

US 10,019,472 B2
Filed: 08/14/2014
Issued: 07/10/2018
Est. Priority Date: 08/14/2014
Status: Active Grant

First Claim

Patent Images

1. A method for querying a distributed dwarf cube comprising a plurality of dwarf cuboids, wherein the distributed dwarf cube is built using a mapreduce technique, the method comprising:

receiving, by a processor, a query for retrieving data from a distributed dwarf cube, wherein the distributed dwarf cube is built of the data, wherein the data comprises cube values, wherein the distributed dwarf cube is built by;

processing the data, at a first mapreduce job of a series of mapreduce jobs, to generate indexes for the data, wherein the indexes are generated for each dimension in the data, wherein the cube values are replaced with a corresponding index for each dimension of the data;

sorting the cube values in one or more dimensions based on a cardinality of the cube values and index associated with each cube value, wherein the cube values are sorted in an order of highest cardinality to lowest cardinality at a second mapreduce job of the series of mapreduce jobs, wherein the cardinality indicates distinctiveness of the cube values in the one or more dimensions;

partitioning the sorted data into data blocks based on a predefined size, wherein each data block is associated with a range, wherein the range corresponds to a start cube value and an end cube value of a highest cardinality dimension in the data block;

building a distributed dwarf cube, comprising dwarf cuboids, at a third mapreduce job of the series of mapreduce jobs, wherein each dwarf cuboid is generated, from a data block, based on the range associated with the data block by;

processing the data block using a dwarf algorithm;

eliminating the dimensions with the highest cardinality from the data;

processing the data recursively based on the series of mapreduce jobs till all the dimensions in the data block are eliminated; and

storing the generated cuboid on a Distributed File System;

querying, by the processor, the distributed dwarf cube, wherein a cluster of query engines is utilized for querying by;

checking, by the processor, the one or more ranges of the cube values based upon the query, wherein the one or more ranges comprise complete cube values and non-complete cube values, wherein the non-complete cube values indicate the cube values present at a start or an end of a range of the one or more ranges;

creating, by the processor, a list of the cube values comprising the complete cube values and/or the non-complete cube values; and

transmitting, by the processor, the list of the cube values from the distributed dwarf cube corresponding to the query.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for querying a distributed dwarf cube are disclosed. A query for retrieving data from a distributed dwarf cube is received. The distributed dwarf cube is built of the data. The data comprises cube values. The distributed dwarf cube is built by processing the data to generate indexes for the data. The cube values in one or more dimensions are sorted based on a cardinality of the cube values. The data is partitioned into data blocks to build distributed dwarf cube from each data block based upon the cardinality of the cube values. The distributed dwarf cube comprises one or more ranges defined for the cube values. The one or more ranges of the cube values are checked based upon the query. Using the cube values, a list is created. The list of the cube values is transmitted from the distributed dwarf cube corresponding to the query.

9 Citations

View as Search Results

17 Claims

1. A method for querying a distributed dwarf cube comprising a plurality of dwarf cuboids, wherein the distributed dwarf cube is built using a mapreduce technique, the method comprising:
- receiving, by a processor, a query for retrieving data from a distributed dwarf cube, wherein the distributed dwarf cube is built of the data, wherein the data comprises cube values, wherein the distributed dwarf cube is built by;
  
  processing the data, at a first mapreduce job of a series of mapreduce jobs, to generate indexes for the data, wherein the indexes are generated for each dimension in the data, wherein the cube values are replaced with a corresponding index for each dimension of the data;
  
  sorting the cube values in one or more dimensions based on a cardinality of the cube values and index associated with each cube value, wherein the cube values are sorted in an order of highest cardinality to lowest cardinality at a second mapreduce job of the series of mapreduce jobs, wherein the cardinality indicates distinctiveness of the cube values in the one or more dimensions;
  
  partitioning the sorted data into data blocks based on a predefined size, wherein each data block is associated with a range, wherein the range corresponds to a start cube value and an end cube value of a highest cardinality dimension in the data block;
  
  building a distributed dwarf cube, comprising dwarf cuboids, at a third mapreduce job of the series of mapreduce jobs, wherein each dwarf cuboid is generated, from a data block, based on the range associated with the data block by;
  
  processing the data block using a dwarf algorithm;
  
  eliminating the dimensions with the highest cardinality from the data;
  
  processing the data recursively based on the series of mapreduce jobs till all the dimensions in the data block are eliminated; and
  
  storing the generated cuboid on a Distributed File System;
  
  querying, by the processor, the distributed dwarf cube, wherein a cluster of query engines is utilized for querying by;
  
  checking, by the processor, the one or more ranges of the cube values based upon the query, wherein the one or more ranges comprise complete cube values and non-complete cube values, wherein the non-complete cube values indicate the cube values present at a start or an end of a range of the one or more ranges;
  
  creating, by the processor, a list of the cube values comprising the complete cube values and/or the non-complete cube values; and
  
  transmitting, by the processor, the list of the cube values from the distributed dwarf cube corresponding to the query.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the building further comprises replicating the distributed dwarf cube on a plurality of nodes.
  - 3. The method of claim 2, wherein the distributed dwarf cube is queried on the plurality of nodes.
  - 4. The method of claim 1, wherein the one or more dimensions are in a form of a star schema files or a single fact file.
  - 5. The method of claim 4, wherein generation of the indexes, for the star schema, comprises replacing the cube values comprising primary keys with the indexes in a sorted order.
  - 6. The method of claim 4, wherein the generation of the indexes, for the single fact file comprises:
    - creating a tree map for the cube values, wherein the tree map comprises a disk based tree-map and a tree like data structure;
      
      collecting the cube values that are distinct in the tree map;
      
      sorting the cube values that are distinct for the dimension; and
      
      replacing the cube values with the indexes for the dimension.
  - 7. The method of claim 1, wherein the querying further comprises:
    - launching a series of threads; and
      
      retrieving the cube values using the series of threads from the distributed dwarf cube, the retrieving comprising;
      
      adding the cube values to the list when the cube values are complete cube values or merging the cube values to the list when the cube values are non-complete cube values.
  - 8. The method of claim 1, the querying further comprises:
    - receiving the query to retrieve multiple cube values from two or more dimensions;
      
      combining the two or more dimensions based on the query, wherein the two or more dimensions comprises at least one root dimension comprising one or more non-root dimensions;
      
      searching the two or more dimensions to identify an intersection of the cube values based on the combination; and
      
      retrieving the cube values based upon the searching.

9. A system for querying a distributed dwarf cube comprising a plurality of dwarf cuboids, wherein the distributed dwarf cube is built using a mapreduce technique, the system comprising:
- a processor;
  
  a memory coupled to the processor, wherein the processor executes program instructions stored in the memory, to;
  
  receive a query for retrieving data from a distributed dwarf cube, wherein the distributed dwarf cube is built of the data, wherein the data comprises cube values, wherein the distributed dwarf cube is built by;
  
  processing the data, at a first mapreduce job of a series of mapreduce jobs, to generate indexes for the data, wherein the indexes are generated for each dimension in the data, wherein the cube values are replaced with a corresponding index for each dimension of the data;
  
  sorting the cube values in one or more dimensions based on a cardinality of the cube values and index associated with each cube value, wherein the cube values are sorted in an order of highest cardinality to lowest cardinality at a second mapreduce job of the series of mapreduce jobs, wherein the cardinality indicates distinctiveness of the cube values in the one or more dimensions;
  
  partitioning the sorted data into data blocks based a predefined size, wherein each data block is associated with a range, wherein the range corresponds to a start cube value and an end cube value of a highest cardinality dimension in the data block;
  
  building a distributed dwarf cube, comprising dwarf cuboids, at a third mapreduce job of the series of mapreduce jobs, wherein each dwarf cuboid is generated, from a data block, based on the range associated with the data block by;
  
  processing the data block using a dwarf algorithm;
  
  eliminating the dimensions with the highest cardinality from the data;
  
  processing the data recursively based on the series of mapreduce jobs till all the dimensions in the data block are eliminated; and
  
  storing the generated cuboid on a Distributed File System;
  
  query the distributed dwarf cube, wherein a cluster of query engines is utilized to query the distributed dwarf cube to;
  
  check the one or more ranges of the cube values based upon the query, wherein the one or more ranges comprise complete cube values and non-complete cube values, wherein the non-complete cube values indicate the cube values present at a start or an end of a range of the one or more ranges;
  
  create a list of the cube values comprising the complete cube values and/or the non-complete cube values; and
  
  transmit the list of the cube values from the distributed dwarf cube corresponding to the query.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, building further comprises replicating the distributed dwarf cube on a plurality of nodes.
  - 11. The system of claim 10, wherein the distributed dwarf cube is queried on the plurality of nodes.
  - 12. The system of claim 9, wherein the one or more dimensions are in a form of a star schema files or a single fact file.
  - 13. The system of claim 12, wherein generation of the indexes, for the star schema, comprises replacing the cube values comprising primary keys with the indexes in a sorted order.
  - 14. The system of claim 12, wherein the generation of the indexes, for the single fact file comprises:
    - creating a tree map for the cube values, wherein the tree map comprises a disk based tree-map and a tree like data structure;
      
      collecting the cube values that are distinct in the tree map;
      
      sorting the cube values that are distinct for the dimension; and
      
      replacing the cube values with the indexes for the dimension.
  - 15. The system of claim 9, wherein the processor further executes the program instructions to:
    - launch a series of threads; and
      
      retrieve the cube values using the series of threads from the distributed dwarf cube, the retrieving comprising;
      
      adding the cube values to the list when the cube values are complete cube values or merging the cube values to the list when the cube values are non-complete cube values.
  - 16. The system of claim 9, wherein the processor further executes the program instructions to:
    - receive the query to retrieve multiple cube values from two or more dimensions;
      
      combine the two or more dimensions based on the query, wherein the two or more dimensions comprises at least one root dimension comprising one or more leaf dimensions;
      
      search the two or more dimensions to identify an intersection of the cube values based on the combination; and
      
      retrieve the cube values based upon the searching.

17. A non-transitory computer readable medium embodying a program executable in a computing device for querying a distributed dwarf cube comprising a plurality of dwarf cuboids, wherein the distributed dwarf cube is built using a mapreduce technique, the program comprising:
- a program code for receiving a query for retrieving data from a distributed dwarf cube, wherein the distributed dwarf cube is built of the data, wherein the data comprises cube values, wherein the distributed dwarf cube is built by;
  
  processing the data at a first mapreduce job of a series of mapreduce jobs, to generate indexes for the data, wherein the indexes are generated for each dimension in the data, wherein the cube values are replaced with a corresponding index for each dimension of the data;
  
  sorting the cube values in one or more dimensions based on a cardinality of the cube values and index associated with each cube value, wherein the cube values are sorted in an order of highest cardinality to lowest cardinality at a second mapreduce job of the series of mapreduce jobs, wherein the cardinality indicates distinctiveness of the cube values in the one or more dimensions;
  
  partitioning the sorted data into data blocks based on a predefined size, wherein each data block is associated with a range, wherein the range corresponds to a start cube value and an end cube value of a highest cardinality dimension in the data block;
  
  building a distributed dwarf cube, comprising dwarf cuboids, at a third mapreduce job of the series of mapreduce jobs, wherein each dwarf cuboid is generated, from a data block, based on the range associated with the data block by;
  
  processing the data block using a dwarf algorithm;
  
  eliminating the dimensions with the highest cardinality from the data;
  
  processing the data recursively based on the series of mapreduce jobs till all the dimensions in the data block are eliminated; and
  
  storing the generated cuboid on a Distributed File System;
  
  a program code for querying the distributed dwarf cube, wherein a cluster of query engines is utilized for querying by;
  
  checking the one or more ranges of the cube values based upon the query, wherein the one or more ranges comprise complete cube values and non-complete cube values, wherein the non-complete cube values indicate the cube values present at a start or an end of a range of the one or more ranges;
  
  creating a list of the cube values comprising the complete cube values and/or the non-complete cube values; and
  
  transmitting the list of the cube values from the distributed dwarf cube corresponding to the query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intellicus Technologies Pvt. Ltd. (Intellicus Technologies, Inc.)
Original Assignee
Intellicus Technologies Pvt. Ltd. (Intellicus Technologies, Inc.)
Inventors
Khandelwal, Ankit, Ghodawat, Kapil, Rastogi, Sajal, Gupta, Saurabh
Primary Examiner(s)
Raab, Christopher J

Application Number

US14/459,803
Publication Number

US 20160048560A1
Time in Patent Office

1,426 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 16/2264 Multidimensional index stru...

G06F 16/283 Multi-dimensional databases...

System and method for querying a distributed dwarf cube

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

9 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for querying a distributed dwarf cube

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

9 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links