PROCESSING DATASETS WITH A DBMS ENGINE
First Claim
1. A method for processing a dataset with a database management system (DBMS) engine, the method comprising:
- splitting bulk data into a plurality of chunks;
converting the chunks to an external dataset comprising a plurality of row groups, the external dataset being external to a DBMS comprising the DBMS engine, the external dataset comprising a DBMS-specific columnar format;
creating an empty DBMS table within the DBMS;
attaching the external dataset to the empty DBMS table; and
executing a MapReduce job on a cluster of compute nodes, using the dataset external to the DBMS as input.
3 Assignments
0 Petitions
Accused Products
Abstract
A system and method to process a dataset with a database management system (DBMS) engine. The method includes splitting bulk data into a plurality of chunks. The method also includes converting the chunks to a plurality of row groups. The row groups are a dataset external to a DBMS comprising the DBMS engine. The method further includes creating an empty DBMS table within the DBMS. Additionally, the method includes attaching the dataset external to the DBMS to the empty DBMS table. The method also includes executing a MapReduce job on a cluster of compute nodes, using the dataset external to the DBMS as input.
-
Citations
20 Claims
-
1. A method for processing a dataset with a database management system (DBMS) engine, the method comprising:
-
splitting bulk data into a plurality of chunks; converting the chunks to an external dataset comprising a plurality of row groups, the external dataset being external to a DBMS comprising the DBMS engine, the external dataset comprising a DBMS-specific columnar format; creating an empty DBMS table within the DBMS; attaching the external dataset to the empty DBMS table; and executing a MapReduce job on a cluster of compute nodes, using the dataset external to the DBMS as input. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for loading data to a DBMS, comprising:
-
splitting bulk data into a plurality of chunks; converting the chunks to an external dataset comprising a plurality of row groups, the external dataset being external to a DBMS comprising the DBMS engine, the external dataset comprising a DBMS-specific columnar format; and performing a binary copy of the row groups to an instance of the DBMS executing on each compute node of a cluster, the row groups being a dataset external to a DBMS comprising the DBMS engine. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A system for executing a MapReduce job, comprising:
-
a cluster of compute nodes, each comprising; a processing unit; and a system memory, wherein the system memory comprises code configured to direct the processing unit to; split bulk data into a plurality of chunks; convert the chunks to an external dataset comprising a plurality of row groups, the external dataset being external to a DBMS comprising the DBMS engine, the external dataset comprising a DBMS-specific columnar format; create an empty DBMS table within the DBMS; attach the dataset external to the DBMS to the empty DBMS table; and execute a MapReduce job on a cluster of compute nodes, using the dataset external to the DBMS as input. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification