Hadoop OLAP engine

US 10,353,923 B2
Filed: 06/30/2014
Issued: 07/16/2019
Est. Priority Date: 04/24/2014
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

at least one processor of a machine;

a metadata engine to generate cube metadata using a mapping from a cube to Hbase table schema, the cube metadata comprising dimension and measure information for the cube;

a cube build engine to generate cube data for the cube based on the cube metadata received from the metadata engine and source data, executing on the at least one processor of the machine, by performing at least a first MapReduce job and a second MapReduce job on the source data to produce a multi-dimensional cube having multiple cuboids, the first MapReduce job and the second MapReduce job having differently configured sets of mappers such that the first MapReduce job generates a first cuboid having a first quantity of dimensions and the second MapReduce job generates a second cuboid having a second quantity of dimensions that is less than the first quantity of dimensions, the cube build engine further configured to store the cube data to a cube store; and

a query engine to receive a query and retrieve query results by accessing at least one of the first cuboid or the second cuboid.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In various example embodiments, systems and methods for building data cubes to be stored in a cube store are presented. In some embodiments, a metadata engine generates the cube metadata. In further embodiments, cube data is generated by a cube build engine based on the cube metadata and source data. The cube build engine performs a multi-stage MapReduce job on the source data to produce a multi-dimensional cube lattice having multiple cuboids. In further embodiments, the cube data is provided to the cube store.

6 Citations

View as Search Results

19 Claims

1. A system comprising:
- at least one processor of a machine;
  
  a metadata engine to generate cube metadata using a mapping from a cube to Hbase table schema, the cube metadata comprising dimension and measure information for the cube;
  
  a cube build engine to generate cube data for the cube based on the cube metadata received from the metadata engine and source data, executing on the at least one processor of the machine, by performing at least a first MapReduce job and a second MapReduce job on the source data to produce a multi-dimensional cube having multiple cuboids, the first MapReduce job and the second MapReduce job having differently configured sets of mappers such that the first MapReduce job generates a first cuboid having a first quantity of dimensions and the second MapReduce job generates a second cuboid having a second quantity of dimensions that is less than the first quantity of dimensions, the cube build engine further configured to store the cube data to a cube store; and
  
  a query engine to receive a query and retrieve query results by accessing at least one of the first cuboid or the second cuboid.

2. A method comprising:
- receiving source data from a database;
  
  receiving cube metadata generated from a metadata engine using a mapping from a cube to Hbase table schema, the cube metadata including dimension and measure information for the cube;
  
  building the cube based on the cube metadata and the source data, executing on at least one processor of a machine, by performing at least a first MapReduce job and a second MapReduce job on the source data to produce a multi-dimensional cube having multiple cuboids representing cube data, the first MapReduce job and the second MapReduce job having differently configured sets of mappers such that the first MapReduce job generates a first cuboid having a first quantity of dimensions and the second MapReduce job generates a second cuboid having a second quantity of dimensions that is less than the first quantity of dimensions;
  
  storing the cube data to a cube store;
  
  receiving a query; and
  
  retrieving query results by accessing at least one of the first cuboid or the second cuboid.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 3. The method of claim 2, wherein the cube has N dimensions, and the first MapReduce job generates the first cuboid with N dimensions as the first quantity of dimensions, and the second MapReduce job generates the second cuboid with N−
    - 1 dimensions as the second quantity of dimensions.
  - 4. The method of claim 3, further comprising:
    - generating, using a third MapReduce job, a third set of one or more cuboids having N−
      
      2 dimensions; and
      
      generating, using a fourth MapReduce job, a fourth set of one or more cuboids having N−
      
      3 dimensions.
  - 5. The method of claim 2, wherein building the cube based on cube metadata and the source data further comprises:
    - transforming source Hadoop Distributed File System (HDFS) files to HFiles for storage in the cube store.
  - 6. The method of claim 5, further comprising:
    - creating one or more HBase tables based on the cube metadata.
  - 7. The method of claim 6, wherein creating the one or more HBase tables based on the cube metadata further comprises:
    - creating the one or more HBase tables having split regions.
  - 8. The method of claim 7, wherein building the cube based on cube metadata and the source data further comprises:
    - performing at least one MapReduce job at each dimension in the multi-dimensional cube to produce temporary HDFS sequence files; and
      
      performing at least one MapReduce job to transform each of the temporary HDFS sequence files into at least one HFile to create a plurality of HFiles representing the cube data of the multi-dimensional cube.
  - 9. The method of claim 8, wherein the cube is an Online Analytical Processing (OLAP) cube, and each dimension stores data of a same type.
  - 10. The method of claim 9, wherein data types include:
    - time data, location data, item data, user data.
  - 11. The method of claim 9, bulk uploading the plurality of HFiles into one or more HBase tables, wherein bulk uploading the plurality of HFiles into one or more HBase tables further comprises bulk uploading each of the plurality of HFiles corresponding to one of the HDFS temporary sequence files into a separate split region in the one or more HBase tables.
  - 12. The method of claim 2, wherein building the cube based on cube metadata and the source data further comprises:
    - building the cube representing a full materialization of the multi-dimensional cube.
  - 13. The method of claim 2, wherein building the cube based on cube metadata and the source data further comprises:
    - building a cube representing partial data of the multi-dimensional cube, the partial data including a subset of the cuboids in the multi-dimensional cube.
  - 14. The method of claim 13, wherein building the cube representing partial data of the multi-dimensional cube further comprises:
    - building cuboids based high cardinality dimensions.
  - 15. The method of claim 14, wherein building cuboids based on high cardinality dimensions further comprises:
    - building cuboids which do not aggregate more than one high cardinality dimension.
  - 16. The method of claim 13, wherein building the cube representing partial data of the multi-dimensional cube further comprises:
    - building the cube representing partial data of the multi-dimensional cube during build time.
  - 17. The method of claim 2, wherein retrieving query results further comprises:
    - selecting the first cuboid or the second cuboid based on which has a quantity of dimensions closer to a quantity of dimensions of data requested in the query.
  - 18. The method of claim 17, wherein the second quantity of dimensions is different than the first quantity of dimensions in that the first quantity of dimensions comprises at least one dimension of data in a same type that is not in the second quantity of dimensions.

19. A machine readable storage device storing instructions that, when executed by at least one processor of a machine, cause the machine to perform operations comprising:
- receiving source data from a database;
  
  receiving cube metadata generated from a metadata engine using a mapping from a cube to table schema, the cube metadata including dimension and measure information for the cube;
  
  building the cube based on the cube metadata and the source data by performing at least a first mapping and reducing job and a second mapping and reducing job on the source data to produce a multi-dimensional cube having multiple cuboids representing cube data, the first mapping and reducing job and the second mapping and reducing job having differently configured sets of mappers such that the first mapping and reducing job generates a first cuboid having a first quantity of dimensions and the second mapping and reducing job generates a second cuboid having a second quantity of dimensions that is less than the first quantity of dimensions;
  
  storing the cube data to a cube store;
  
  receiving a query; and
  
  retrieving query results by accessing at least one of the first cuboid or the second cuboid.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
eBay Inc.
Original Assignee
eBay Inc.
Inventors
Han, Luke Qing, Jiang, Xu, Song, Yi, Li, Chauncey
Primary Examiner(s)
Burke, Jeff A
Assistant Examiner(s)
Conyers, Dawaune A

Application Number

US14/320,607
Publication Number

US 20150310082A1
Time in Patent Office

1,842 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/185   Hierarchical storage manage...

G06F 16/2471   Distributed queries

G06F 16/254   Extract, transform and load...

G06F 16/283   Multi-dimensional databases...

Hadoop OLAP engine

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

6 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Hadoop OLAP engine

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

6 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links