Method and System for Optimizing Reduce-Side Join Operation in a Map-Reduce Framework
First Claim
1. A computer system for optimizing reduce-side join operation in a Map-reduce framework between a first data structure and a second data structure, the first data structure being sorted and divided into one or more regions, the system comprising:
- a. one or more processors; and
b. a non-transitory memory containing instructions that, when executed by said one or more processors, causes said one or more processors to perform a set of steps comprising;
i. executing module for executing one or more map operations by one or more processors, wherein executing one or more map operation by one or more processors comprises;
1. fetching input data of the second data structure;
2. partitioning the data of the second data structure according to key-value pair;
3. projecting the key-value pairs of the second data structure to a partitioner;
4. maintaining one or more region key counters;
wherein the region key counter being used for registering key count value of one or more regions of the second data structure; and
5. emitting the key count value of one or more regions and corresponding data, wherein the key count values are emitted prior to the corresponding data;
ii. grouping module for grouping mapped data corresponding to a single region of the second data structure;
iii. accumulating module for providing the grouped data to a reducer; and
iv. fetching module for retrieving descriptive metadata of one or more regions of the first data structure;
v. selecting module for selecting one of a look-up approach and a scan approach to perform join operation by one or more reducers based on associated key count value and predefined criteria by the reducer, for performing the join operation.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a system and method for optimizing reduce-side join operation in a map-reduce framework. The system and method executing one or more map operations on the second data structure, grouping the data tuples to a single region of the second data structure, providing the grouped data to a single reducer and, selecting one of scan approach and a look-up approach by one or more reducers based on region key count value and pre-determined conditions of the user.
8 Citations
10 Claims
-
1. A computer system for optimizing reduce-side join operation in a Map-reduce framework between a first data structure and a second data structure, the first data structure being sorted and divided into one or more regions, the system comprising:
-
a. one or more processors; and b. a non-transitory memory containing instructions that, when executed by said one or more processors, causes said one or more processors to perform a set of steps comprising; i. executing module for executing one or more map operations by one or more processors, wherein executing one or more map operation by one or more processors comprises; 1. fetching input data of the second data structure; 2. partitioning the data of the second data structure according to key-value pair; 3. projecting the key-value pairs of the second data structure to a partitioner; 4. maintaining one or more region key counters;
wherein the region key counter being used for registering key count value of one or more regions of the second data structure; and5. emitting the key count value of one or more regions and corresponding data, wherein the key count values are emitted prior to the corresponding data; ii. grouping module for grouping mapped data corresponding to a single region of the second data structure; iii. accumulating module for providing the grouped data to a reducer; and iv. fetching module for retrieving descriptive metadata of one or more regions of the first data structure; v. selecting module for selecting one of a look-up approach and a scan approach to perform join operation by one or more reducers based on associated key count value and predefined criteria by the reducer, for performing the join operation.
-
-
2. A method for optimizing reduce-side join operation in a Map-reduce framework between a first data structure and a second data structure, the first data structure being sorted and divided into one or more regions, the method comprising:
-
a. executing one or more map operations by one or more processors, wherein executing one or more map operation by one or more processors comprises; 1. fetching input data of the second data structure; 2. partitioning the data of the second data structure according to key-value pair; 3. projecting the key-value pairs of the second data structure to a partitioner; 4. maintaining one or more region key counters;
wherein the region key counter being used for registering key count value of one or more regions of the second data structure; and5. emitting the key count value of one or more regions and corresponding data, wherein the key count values are emitted prior to the corresponding data; b. grouping mapped data corresponding to a single region of the second data structure; c. providing the grouped data to a reducer; d. retrieving descriptive metadata of one or more regions of the first data structure; e. selecting one of a look-up approach and a scan approach to perform join operation by one or more reducers based on associated key count value and predefined criteria by the reducer, for performing the join operation. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for optimizing reduce-side join operation in a Map-reduce framework between a first data structure and a second data structure, the system comprising:
-
a. a machine readable medium for storing instructions; and b. at least one processor for processing the instructions, wherein the instructions cause the at least one processor to execute the operations of; i. executing one or more map operations by one or more processors c. grouping mapped data corresponding to a single region of the second data structure; d. providing the grouped data to a reducer; e. retrieving descriptive metadata of one or more regions of the first data structure; f. selecting one of a look-up approach and a scan approach to perform join operation by one or more reducers based on associated key count value and predefined criteria by the reducer, for performing the join operation.
-
Specification