Method for sorting and distributing data among a plurality of nodes
First Claim
1. A method for sorting and distributing a database across a plurality of slave nodes, the method comprising the steps of:
- distributing a portion of the database to each slave node;
sorting, at each slave node, the database portion received;
determining, at each slave node, a proposed partitioning of the database among the plurality of slave nodes, the proposed partitioning being based at least in part on the sorted database portion of the determining slave node;
determining, at a master node, a tentative partitioning of the sorted database among the slave nodes based at least in part on the proposed partitionings determined by the plurality of slave nodes;
estimating, at each slave node, a distribution of the database among the slave nodes resulting from the tentative partitioning, wherein the estimated distribution is based at least in part on the sorted database portion at the slave node;
revising, at the master node, the tentative partitioning when the distribution of the database estimated by the slave nodes indicates the tentative partitioning is not an acceptable partitioning; and
arriving at an acceptable partitioning and distributing the database based at least in part on the acceptable partitioning.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for distributing and sorting data among a plurality of nodes is describe herein. After receiving a portion of a data set (e.g., a database), each node sorts its portion and estimates a partitioning of the sorted dataset among the nodes based in part on its own sorted data portion. Each node then provides a representation of its estimated partition to a master node. The master node, using the provided estimated partitions, determines a tentative partitioning and submits the tentative partitioning to each node. Each node then determines the effect the tentative partitioning using its data portion. If the effect is acceptable for each node, the tentative partitioning plan is used to partition the data. Otherwise, the tentative partitioning plan is repeatedly revised by the master node and considered by the nodes having data portions until an acceptable or optimum partitioning is determined. Each node then distributes data from its data portion that falls outside the partition assigned to the node to the appropriate node. Upon receipt of this data, each node can perform a merge sort to add the received data to the previously sorted data portion at the node.
168 Citations
32 Claims
-
1. A method for sorting and distributing a database across a plurality of slave nodes, the method comprising the steps of:
-
distributing a portion of the database to each slave node;
sorting, at each slave node, the database portion received;
determining, at each slave node, a proposed partitioning of the database among the plurality of slave nodes, the proposed partitioning being based at least in part on the sorted database portion of the determining slave node;
determining, at a master node, a tentative partitioning of the sorted database among the slave nodes based at least in part on the proposed partitionings determined by the plurality of slave nodes;
estimating, at each slave node, a distribution of the database among the slave nodes resulting from the tentative partitioning, wherein the estimated distribution is based at least in part on the sorted database portion at the slave node;
revising, at the master node, the tentative partitioning when the distribution of the database estimated by the slave nodes indicates the tentative partitioning is not an acceptable partitioning; and
arriving at an acceptable partitioning and distributing the database based at least in part on the acceptable partitioning. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for sorting and distributing a database across a plurality of slave nodes, each slave node storing a portion of the database, the method comprising the steps of:
-
distributing a portion of the database to each slave node;
sorting, at each slave node, the database portion received at the slave node;
assigning each slave node to one of a plurality of subsets of slave nodes;
estimating, at each slave node, a proposed partitioning of the database among the slave nodes based at least in part on the sorted database portion at the slave node;
determining, at a master node, a tentative partitioning of the sorted database among the plurality of subsets based at least in part on the proposed partitionings of the slave nodes;
determining, at an elected slave node of each subset, a tentative subset partitioning of a portion of the database assigned to the subset among the slave nodes of the subset;
estimating, at each slave node of each subset, a distribution of the database portion assigned to the subset among the slave nodes of the subset resulting from the tentative subset partitioning, wherein the distribution estimated by each slave node is based at least in part on the database portion stored at the slave node;
revising, at the elected slave node of each subset, the tentative subset partitioning when the distribution of the database estimated by the slave nodes of the subset indicates that the tentative subset partitioning is not an acceptable partitioning; and
arriving at an acceptable partitioning and distributing the database based at least in part on the acceptable partitioning. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system for sorting and distributing a database among a plurality of nodes, the system comprising:
-
a plurality of slave nodes, each slave node storing a portion of the database and each slave node being adapted to;
sort its database portion;
estimate a proposed partitioning of the database among the slave nodes based at least in part on its sorted database portion; and
estimate a distribution of the database among the slave nodes resulting from a tentative partitioning, wherein the estimated distribution is based at least in part on the sorted database portion at the slave node;
a master node in communication with the slave nodes and being adapted to;
determine the tentative partitioning of the sorted database among the slave nodes based at least in part on the proposed partitionings from the slave nodes; and
revise the tentative partitioning when the distribution of the database estimated by the slave nodes indicates the tentative partitioning is not an acceptable partitioning. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
-
25. A system for sorting and distributing a database among a plurality of nodes, the system comprising:
-
a plurality of slave nodes, each slave node storing a portion of the database, each slave node being assigned to one of a plurality of subsets, and each slave node being adapted to;
sort its database portion;
estimate a proposed partitioning of the database among the slave nodes based at least in part on its sorted database portion;
estimate a distribution of a database portion assigned to the subset among the slave nodes of the subset based at least in part on a tentative partitioning of the database portion and based at least in part on the sorted database portion at the slave node; and
a master node in communication with the slave nodes and being adapted to;
determine a partitioning of the sorted database among the subsets based at least in part on the proposed partitionings from the slave nodes, each subset being assigned a portion of the database;
wherein an elected slave node of each subset is further adapted to;
determine the tentative partitioning of the database portion assigned to the subset; and
revise the tentative partitioning of the database portion when the distribution of the database portion estimated by the slave nodes indicates the tentative partitioning is not an acceptable partitioning. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32)
-
Specification