PROCESSING DATASETS WITH A DBMS ENGINE

US 20150074151A1
Filed: 09/11/2013
Published: 03/12/2015
Est. Priority Date: 09/11/2013
Status: Active Grant

First Claim

Patent Images

1. A method for processing a dataset with a database management system (DBMS) engine, the method comprising:

splitting bulk data into a plurality of chunks;

converting the chunks to an external dataset comprising a plurality of row groups, the external dataset being external to a DBMS comprising the DBMS engine, the external dataset comprising a DBMS-specific columnar format;

creating an empty DBMS table within the DBMS;

attaching the external dataset to the empty DBMS table; and

executing a MapReduce job on a cluster of compute nodes, using the dataset external to the DBMS as input.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method to process a dataset with a database management system (DBMS) engine. The method includes splitting bulk data into a plurality of chunks. The method also includes converting the chunks to a plurality of row groups. The row groups are a dataset external to a DBMS comprising the DBMS engine. The method further includes creating an empty DBMS table within the DBMS. Additionally, the method includes attaching the dataset external to the DBMS to the empty DBMS table. The method also includes executing a MapReduce job on a cluster of compute nodes, using the dataset external to the DBMS as input.

Citations

20 Claims

1. A method for processing a dataset with a database management system (DBMS) engine, the method comprising:
- splitting bulk data into a plurality of chunks;
  
  converting the chunks to an external dataset comprising a plurality of row groups, the external dataset being external to a DBMS comprising the DBMS engine, the external dataset comprising a DBMS-specific columnar format;
  
  creating an empty DBMS table within the DBMS;
  
  attaching the external dataset to the empty DBMS table; and
  
  executing a MapReduce job on a cluster of compute nodes, using the dataset external to the DBMS as input.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method recited in claim 1, the compute nodes each comprising an instance of the DBMS.
  - 3. The method recited in claim 2, wherein converting the chunks is performed in parallel on the compute nodes, each of the compute nodes converting one of the chunks.
  - 4. The method recited in claim 2, comprising:
    - the MapReduce job sending commands to the DBMS; and
      
      the DBMS processing external dataset in response to the commands.
  - 5. The method recited in claim 2, wherein each of the chunks comprise a number of rows approximate to a number of rows of the bulk data divided by the number of compute nodes.
  - 6. The method recited in claim 2, wherein the MapReduce job comprises a map job executing on each of the compute nodes.
  - 7. The method recited in claim 1, wherein attaching the dataset comprises copying metadata describing the dataset to a catalog of the DBMS.

8. A method for loading data to a DBMS, comprising:
- splitting bulk data into a plurality of chunks;
  
  converting the chunks to an external dataset comprising a plurality of row groups, the external dataset being external to a DBMS comprising the DBMS engine, the external dataset comprising a DBMS-specific columnar format; and
  
  performing a binary copy of the row groups to an instance of the DBMS executing on each compute node of a cluster, the row groups being a dataset external to a DBMS comprising the DBMS engine.
- View Dependent Claims (9, 10, 11, 12)
- - 9. The method of claim 8, wherein converting the chunks is performed in parallel on the compute nodes, each of the compute nodes converting one of the chunks.
  - 10. The method recited in claim 8, comprising:
    - the MapReduce job sending commands to the DBMS; and
      
      the DBMS processing the previously attached ECF data.
  - 11. The method recited in claim 8, wherein each of the chunks comprise a number of rows approximate to a number of rows of the bulk data divided by the number of compute nodes.
  - 12. The method recited in claim 8, wherein a number of chunks approximates a number of compute nodes.

13. A system for executing a MapReduce job, comprising:
- a cluster of compute nodes, each comprising;
  
  a processing unit; and
  
  a system memory, wherein the system memory comprises code configured to direct the processing unit to;
  
  split bulk data into a plurality of chunks;
  
  convert the chunks to an external dataset comprising a plurality of row groups, the external dataset being external to a DBMS comprising the DBMS engine, the external dataset comprising a DBMS-specific columnar format;
  
  create an empty DBMS table within the DBMS;
  
  attach the dataset external to the DBMS to the empty DBMS table; and
  
  execute a MapReduce job on a cluster of compute nodes, using the dataset external to the DBMS as input.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The system recited in claim 13, the compute nodes each comprising an instance of the DBMS.
  - 15. The system recited in claim 13, wherein converting the chunks is performed in parallel on the compute nodes, each of the compute nodes converting one of the chunks.
  - 16. The system recited in claim 13, wherein a number of chunks is equal to a number of compute nodes.
  - 17. The system recited in claim 13, wherein each of the chunks comprise a number of rows approximate to a number of rows of the bulk data divided by the number of compute nodes.
  - 18. The system recited in claim 13, wherein the MapReduce job comprises a map job executing on each of the compute nodes.
  - 19. The system recited in claim 13, wherein attaching the dataset comprises copying metadata describing the dataset to a catalog of the DBMS.
  - 20. The system recited in claim 13, the empty DBMS table comprising one or more columns of a same type and number as one or more columns of the dataset external to the DBMS.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Chaiken, Ronnie, Larson, Per-Ake, Foehr, Oliver

Granted Patent

US 10,133,800 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/803
CPC Class Codes

G06F 16/221   Column-oriented storage; Ma...

G06F 16/2219   Large Object storage; Manag...

G06F 16/24561   Intermediate data storage t...

G06F 16/258   Data format conversion from...

G06F 16/27   Replication, distribution o...

PROCESSING DATASETS WITH A DBMS ENGINE

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

PROCESSING DATASETS WITH A DBMS ENGINE

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links