Map-reduce with merge to process multiple relational datasets
First Claim
1. A method of processing data and data relationships of at least two datasets, comprising:
- for the data of each one of the datasets,mapping the data of that dataset to corresponding intermediate data for that dataset; and
reducing the intermediate data for that dataset to a set of reduced intermediate data for that dataset; and
merging data corresponding to the sets of reduced intermediate data, in accordance with a merge condition.
9 Assignments
0 Petitions
Accused Products
Abstract
A method of processing relationships of at least two datasets is provided. For each of the datasets, a map-reduce subsystem is provided such that the data of that dataset is mapped to corresponding intermediate data for that dataset. The intermediate data for that dataset is reduced to a set of reduced intermediate data for that dataset. Data corresponding to the sets of reduced intermediate data are merged, in accordance with a merge condition. In some examples, data being merged may include the output of one or more other mergers. That is, generally, merge functions may be flexibly placed among various map-reduce subsystems and, as such, the basic map-reduce architecture may be advantageously modified to process multiple relational datasets using, for example, clusters of computing devices.
-
Citations
55 Claims
-
1. A method of processing data and data relationships of at least two datasets, comprising:
-
for the data of each one of the datasets, mapping the data of that dataset to corresponding intermediate data for that dataset; and reducing the intermediate data for that dataset to a set of reduced intermediate data for that dataset; and merging data corresponding to the sets of reduced intermediate data, in accordance with a merge condition. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computing system configured to process relationships of at least two datasets, the computing system including at least one computing device configured to:
-
for the data of each one of the datasets, map the data of that dataset to corresponding intermediate data for that dataset; and reduce the intermediate data for that dataset to a set of reduced intermediate data for that dataset; and merge data corresponding to the sets of reduced intermediate data, in accordance with a merge condition. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
-
-
40. A method of configuring a computing system to process data and data relationships of at least two datasets, comprising:
-
for the data of each one of the datasets, configuring the computing system to include a mapping function to map data of that dataset to corresponding intermediate data for that dataset; and configuring the computing system to include a reducing function to reduce the intermediate data for that dataset to a set of reduced intermediate data for that dataset; and configuring the computing system to include a merging function to merge data corresponding to the sets of reduced intermediate data, in accordance with a merge condition. - View Dependent Claims (41, 42, 43, 44, 45)
-
-
46. A computer program product for processing data and data relationships of at least two datasets, the computer program product comprising at least one computer-readable medium having computer program instructions stored therein which are operable to cause at least one computing device to:
-
for the data of each one of the datasets, map the data of that dataset to corresponding intermediate data for that dataset; and reduce the intermediate data for that dataset to a set of reduced intermediate data for that dataset; and merge data corresponding to the sets of reduced intermediate data, in accordance with a merge condition. - View Dependent Claims (47, 48, 49, 50, 51, 52, 53, 54, 55)
-
Specification