Method for sorting and distributing data among a plurality of nodes
First Claim
1. A method for sorting and distributing a database across a plurality of slave nodes, the method comprising:
- distributing a portion of the database to each slave node;
sorting, at each slave node, the database portion received, resulting in sorted database;
generating, at each slave node, a proposed partitioning of the database among the plurality of slave nodes, the proposed partitioning being based at least in part on the sorted database portion of the generating slave node;
determining, at a master node, a tentative partitioning of the sorted database among the slave nodes based at least in part on the proposed partitioning determined by the plurality of slave nodes;
estimating, at each slave node, a distribution of the database among the slave nodes resulting from the tentative partitioning, wherein the estimated distribution is based at least in part on the sorted database portion at the slave node;
revising, at the master node, the tentative partitioning when the distribution of the database estimated by the slave nodes indicates the tentative partitioning is not an acceptable partitioning; and
arriving at an acceptable partitioning and distributing the database based at least in part on the acceptable partitioning.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for distributing and sorting data among a plurality of nodes is described herein. After receiving a portion of a data set (e.g., a database), each node sorts its portion and estimates a partitioning of the sorted dataset among the nodes based in part on its own sorted data portion. Each node then provides a representation of its estimated partition to a master node. The master node, using the provided estimated partitions, determines a tentative partitioning and submits the tentative partitioning to each node. Each node then determines the effect the tentative partitioning using its data portion. If the effect is acceptable for each node, the tentative partitioning plan is used to partition the data. Otherwise, the tentative partitioning plan is repeatedly revised by the master node and considered by the nodes having data portions until an acceptable or optimum partitioning is determined. Each node then distributes data from its data portion that falls outside the partition assigned to the node to the appropriate node. Upon receipt of this data, each node can perform a merge sort to add the received data to the previously sorted data portion at the node.
-
Citations
36 Claims
-
1. A method for sorting and distributing a database across a plurality of slave nodes, the method comprising:
-
distributing a portion of the database to each slave node; sorting, at each slave node, the database portion received, resulting in sorted database; generating, at each slave node, a proposed partitioning of the database among the plurality of slave nodes, the proposed partitioning being based at least in part on the sorted database portion of the generating slave node; determining, at a master node, a tentative partitioning of the sorted database among the slave nodes based at least in part on the proposed partitioning determined by the plurality of slave nodes; estimating, at each slave node, a distribution of the database among the slave nodes resulting from the tentative partitioning, wherein the estimated distribution is based at least in part on the sorted database portion at the slave node; revising, at the master node, the tentative partitioning when the distribution of the database estimated by the slave nodes indicates the tentative partitioning is not an acceptable partitioning; and arriving at an acceptable partitioning and distributing the database based at least in part on the acceptable partitioning. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for sorting and distributing a database across a plurality of slave nodes, each slave node storing a portion of the database, the method comprising:
-
distributing a portion of the database to each slave node; sorting, at each slave node, the database portion received at the slave node; assigning each slave node to one of a plurality of subsets of slave nodes; generating, at each slave node, a proposed partitioning of the database among the slave nodes based at least in part on the sorted database portion at the slave node; determining, at a master node, a tentative partitioning of the sorted database among the plurality of subsets based at least in part on the proposed partitioning of the slave nodes; determining, at an elected slave node of each subset, a tentative subset partitioning of a portion of the database assigned to the subset among the slave nodes of the subset; estimating, at each slave node of each subset, a distribution of the database portion assigned to the subset among the slave nodes of the subset resulting from the tentative subset partitioning, wherein the distribution estimated by each slave node is based at least in part on the database portion stored at the slave node; revising, at the elected slave node of each subset, the tentative subset partitioning when the distribution of the database estimated by the slave nodes of the subset indicates that the tentative subset partitioning is not an acceptable partitioning; and arriving at an acceptable partitioning and distributing the database based at least in part on the acceptable partitioning. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system for sorting and distributing a database among a plurality of nodes, the system comprising:
-
a plurality of slave nodes, each slave node storing a portion of the database and each slave node being adapted to; sort its database portion; generate a proposed partitioning of the entire database among the slave nodes based at least in part on its sorted database portion; estimate a distribution of the database among the slave nodes resulting from a tentative partitioning, wherein the estimated distribution is based at least in part on the sorted database portion at the slave node; and transfer at least some data in its database portion to at least one other slave node based at least in part on an acceptable partitioning arrived at by a master node to provide a distributed database; a master node in communication with the slave nodes and being adapted to; determine the tentative partitioning of the sorted database among the slave nodes based at least in part on the proposed partitioning from the slave nodes; revise the tentative partitioning when the distribution of the database estimated by the slave nodes indicates the tentative partitioning is not an acceptable partitioning; arrive at the acceptable partitioning; and make the acceptable partitioning available to the plurality of slave nodes. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
-
25. A system for sorting and distributing a database among a plurality of nodes, the system comprising:
-
a plurality of slave nodes, each slave node storing a portion of the database, each slave node being assigned to one of a plurality of subsets, and each slave node being adapted to; sort its database portion; generate a proposed partitioning of the entire database among the slave nodes based at least in part on its sorted database portion; estimate a distribution of a database portion assigned to the subset among the slave nodes of the subset based at least in part on a tentative partitioning of the database portion and based at least in part on the sorted database portion at the slave node; and transfer at least some data in its database portion to at least one other slave node based at least in part on an acceptable partitioning arrived at by a master node to provide a distributed database; a master node in communication with the slave nodes and being adapted to; determine a partitioning of the sorted database among the subsets based at least in part on the proposed partitioning from the slave nodes, each subset being assigned a portion of the database; arrive at the acceptable partitioning; and make the acceptable partitioning available to the plurality of slave nodes; wherein an elected slave node of each subset is further adapted to; determine the tentative partitioning of the database portion assigned to the subset; and revise the tentative partitioning of the database portion when the distribution of the database portion estimated by the slave nodes indicates the tentative partitioning is not an acceptable partitioning. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32)
-
-
33. A method for sorting and distributing a data set in a database across a plurality of slave nodes, the method comprising:
-
distributing a portion of the data set to each slave node; sorting, at each slave node, the portion of the data set received; generating, at each slave node, a proposed partitioning of the entire data set among the plurality of slave nodes, the proposed partitioning being based at least in part on the sorted data set portion of the generating slave node; determining, at a master node, a tentative partitioning of the sorted data set among the slave nodes based at least in part on the proposed partitioning determined by the plurality of slave nodes; estimating, at each slave node, a distribution of the data set among the slave nodes resulting from the tentative partitioning, wherein the estimated distribution is based at least in part on the sorted data set portion at the slave node; revising, at the master node, the tentative partitioning when the distribution of the data set estimated by the slave nodes indicates the tentative partitioning is not an acceptable partitioning; and arriving at an acceptable partitioning and distributing the data set based at least in part on the acceptable partitioning.
-
-
34. A method for sorting and distributing a data set in a database across a plurality of slave nodes, each slave node storing a portion of the data set, the method comprising:
-
distributing a portion of the data set to each slave node; sorting, at each slave node, the data set portion received at the slave node; assigning each slave node to one of a plurality of subsets of slave nodes; generating, at each slave node, a proposed partitioning of the entire data set among the slave nodes based at least in part on the sorted data set portion at the slave node; determining, at a master node, a tentative partitioning of the sorted data set among the plurality of subsets based at least in part on the proposed partitioning of the slave nodes; determining, at an elected slave node of each subset, a tentative subset partitioning of a portion of the data set assigned to the subset among the slave nodes of the subset; estimating, at each slave node of each subset, a distribution of the data set portion assigned to the subset among the slave nodes of the subset resulting from the tentative subset partitioning, wherein the distribution estimated by each slave node is based at least in part on the data set portion stored at the slave node; revising, at the elected slave node of each subset, the tentative subset partitioning when the distribution of the data set estimated by the slave nodes of the subset indicates that the tentative subset partitioning is not an acceptable partitioning; and arriving at an acceptable partitioning and distributing the data set based at least in part on the acceptable partitioning.
-
-
35. A system for sorting and distributing a data set in a database among a plurality of nodes, the system comprising:
-
a plurality of slave nodes, each slave node storing a portion of the data set and each slave node being adapted to; sort its data set portion; generate a proposed partitioning of the entire data set among the slave nodes based at least in part on its sorted data set portion; estimate a distribution of the data set among the slave nodes resulting from a tentative partitioning, wherein the estimated distribution is based at least in part on the sorted data set portion at the slave node; and transfer at least some data in its database portion to at least one other slave node based at least in part on an acceptable partitioning arrived at by a master node to provide a distributed database; a master node in communication with the slave nodes and being adapted to; determine the tentative partitioning of the sorted data set among the slave nodes based at least in part on the proposed partitioning from the slave nodes; revise the tentative partitioning when the distribution of the data set estimated by the slave nodes indicates the tentative partitioning is not an acceptable partitioning; arrive at the acceptable partitioning; and make the acceptable partitioning available to the plurality of slave nodes.
-
-
36. A system for sorting and distributing a data set in a database among a plurality of nodes, the system comprising:
-
a plurality of slave nodes, each slave node storing a portion of the data set, each slave node being assigned to one of a plurality of subsets, and each slave node being adapted to; sort its data set portion; generate a proposed partitioning of the entire data set among the slave nodes based at least in part on its sorted data set portion; estimate a distribution of a data set portion assigned to the subset among the slave nodes of the subset based at least in part on a tentative partitioning of the data set portion and based at least in part on the sorted data set portion at the slave node; and transfer at least some data in its database portion to at least one other slave node based at least in part on an acceptable partitioning arrived at by a master node to provide a distributed database; a master node in communication with the slave nodes and being adapted to; determine a partitioning of the sorted data set among the subsets based at least in part on the proposed partitioning from the slave nodes, each subset being assigned a portion of the data set; arrive at the acceptable partitioning; and make the acceptable partitioning available to the plurality of slave nodes; wherein an elected slave node of each subset is further adapted to; determine the tentative partitioning of the data set portion assigned to the subset; and revise the tentative partitioning of the data set portion when the distribution of the data set portion estimated by the slave nodes indicates the tentative partitioning is not an acceptable partitioning.
-
Specification