System and Method for Join-Partitioning For Local Computability of Query Over Shared-Nothing Clusters
First Claim
1. A system to support data partitioning for local computability of query, comprising:
- a cluster, comprising a plurality of hosts;
a partitioning module operable to partition each of a plurality of relations of a dataset into a plurality of sub-relations and store them on the plurality of hosts; and
a query executing module operable to;
execute a query to the dataset over the plurality of partitioned sub-relations locally on each of the plurality of hosts without moving data among them or causing network bottlenecks in the cluster; and
generate result of the query by merging local query output from the plurality of hosts.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention addresses the problem of partitioning database relations across a plurality of hosts in a shared-nothing cluster while minimizing communication costs. A new partitioning approach for database relations—join-partitioning—is introduced that focuses on relations and their dependencies to a priori partition the plurality of relations to the plurality of hosts such that each host can locally compute a large class of queries over its partitioned data in isolation at query time, independent of other hosts in the cluster. Such an approach thus overcomes the primary bottleneck, i.e., the network, in deploying database systems over shared-nothing clusters by allowing them to seamlessly scale linearly to tens of thousands of hosts and manage tens or hundreds of terabytes of data. This description is not intended to be a complete description of, or limit the scope of, the invention. Other features, aspects, and objects of the invention can be obtained from a review of the specification, the figures, and the claims.
64 Citations
29 Claims
-
1. A system to support data partitioning for local computability of query, comprising:
-
a cluster, comprising a plurality of hosts; a partitioning module operable to partition each of a plurality of relations of a dataset into a plurality of sub-relations and store them on the plurality of hosts; and a query executing module operable to; execute a query to the dataset over the plurality of partitioned sub-relations locally on each of the plurality of hosts without moving data among them or causing network bottlenecks in the cluster; and generate result of the query by merging local query output from the plurality of hosts. - View Dependent Claims (5, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
2. A system to support data partitioning for local computability of query, comprising:
-
a plurality of databases operable to maintain data on a plurality of hosts in a cluster; a partitioning module operable to partition each of a plurality of relations of a dataset into a plurality of sub-relations and store them on the plurality of databases; and a query executing module operable to; execute a query to the dataset over the plurality of partitioned sub-relations locally on each of the plurality of databases without moving data among them or causing network bottlenecks in the cluster; and generate result of the query by merging local query output from the plurality of hosts. - View Dependent Claims (3)
-
-
4. A system to support data partitioning for local computability of query, comprising:
-
a cluster, comprising a plurality of hosts; and a partitioning module operable to; partition each of a plurality of relations of a dataset into a plurality of sub-relations and store them on the plurality of hosts while preserving one or more foreign key constraints locally on each of the plurality of hosts; and store the plurality of sub-relations locally on the plurality of hosts so that a class of queries to the dataset is computable locally over the plurality of hosts without moving data among them or causing network bottlenecks in the cluster.
-
-
6. A system to support data partitioning for local computability of query, comprising:
-
a cluster, comprising a plurality of hosts; a partitioning module operable to partition each of a plurality of relations of a dataset into a plurality of sub-relations and store them on the plurality of hosts; and a query executing module operable to; identify at query time whether a query to the dataset is locally computable on each of the plurality of hosts independent of rest of the plurality of hosts; execute the query over the plurality of partitioned sub-relations locally on each of the plurality of hosts without moving data among them or causing network bottlenecks in the cluster; and generate result of the query by merging local query output from the plurality of hosts. - View Dependent Claims (7)
-
-
8. A system to support data partitioning for local computability of query, comprising:
-
a shared-nothing cluster compromising a plurality of hosts sharing no hard disk drive and/or computer memory among them; a partitioning module operable to partition each of a plurality of relations of a dataset into a plurality of sub-relations and store them on the plurality of hosts while preserving one or more foreign key references locally on each of the plurality of hosts; and a query executing module operable to; identify at query time whether a query to the dataset is computable over the plurality of sub-relations locally on each of the plurality of hosts independent of rest of the plurality of hosts; execute the query over the plurality of partitioned sub-relations stored on the plurality of hosts without moving data among them or causing network bottlenecks; and generate result of the query by merging local query output from the plurality of hosts.
-
-
18. A method to support data partitioning for local computability of query, comprising:
-
partitioning each of a plurality of relations of a dataset into a plurality of sub-relations and store them on a plurality of hosts in a cluster; executing a query to the dataset over the plurality of partitioned sub-relations locally on each of the plurality of hosts without moving data among them or causing network bottlenecks in the cluster; and generating result of the query by merging local query output from the plurality of hosts. - View Dependent Claims (23, 24, 25, 26, 27)
-
-
19. A method to support data partitioning for local computability of query, comprising:
-
partitioning each of a plurality of relations of a dataset into a plurality of sub-relations and store them on the plurality of hosts while preserving one or more foreign key constraints locally on each of the plurality of hosts; and storing the plurality of sub-relations locally on the plurality of hosts so that a class of queries to the dataset is computable locally over the plurality of hosts without moving data among them or causing network bottlenecks in the cluster.
-
-
20. A method to support data partitioning for local computability of query, comprising:
-
partitioning each of a plurality of relations of a dataset into a plurality of sub-relations and store them on a plurality of hosts in a cluster; identifying at query time whether a query to the dataset is locally computable on each of the plurality of hosts independent of rest of the plurality of hosts; executing the query over the plurality of partitioned sub-relations locally on each of the plurality of hosts without moving data among them or causing network bottlenecks in the cluster; and generating result of the query by merging local query output from the plurality of hosts. - View Dependent Claims (21)
-
-
22. A method to support data partitioning for local computability of query, comprising:
partitioning each of a plurality of relations of a dataset into a plurality of sub-relations and store them on a plurality of hosts in a shared-nothing cluster, while preserving one or more foreign key references locally on each of a plurality of hosts; identifying at query time whether a query to the dataset is computable over the plurality of sub-relations locally on each of the plurality of hosts independent of rest of the plurality of hosts; executing the query over the plurality of partitioned sub-relations stored on the plurality of hosts without moving data among them or causing network bottlenecks; and generating result of the query by merging local query output from the plurality of hosts.
-
28. A machine readable medium having instructions stored thereon that when executed cause a system to:
-
partition each of a plurality of relations of a dataset into a plurality of sub-relations and store them on a plurality of hosts in a cluster; execute a query to the dataset over the plurality of partitioned sub-relations locally on each of the plurality of hosts without moving data among them or causing network bottlenecks in the cluster; and generate result of the query by merging local query output from the plurality of hosts.
-
-
29. A system to support data partitioning for local computability of query, comprising:
-
means for maintaining data on a plurality of hosts in a cluster; means for partitioning each of a plurality of relations of a dataset into a plurality of sub-relations and store them on the plurality of databases; means for executing a query to the dataset over the plurality of partitioned sub-relations locally on each of the plurality of databases without moving data among them or causing network bottlenecks in the cluster; and means for generating result of the query by merging local query output from the plurality of hosts.
-
Specification