×

System and method for join-partitioning for local computability of query over shared-nothing clusters

  • US 8,156,107 B2
  • Filed: 02/02/2007
  • Issued: 04/10/2012
  • Est. Priority Date: 02/02/2007
  • Status: Active Grant
First Claim
Patent Images

1. A system to support data partitioning for local computability of queries, comprising:

  • a shared-nothing cluster including a plurality of hosts sharing no hard disk drive and/or computer memory among the hosts;

    a partitioning module operable to actively partition each of a plurality of relations of a dataset as the dataset is being provided by an external user of the system into a plurality of sub-relations and to store the sub-relations on the plurality of hosts, the partitioning module performing join partitioning while preserving one or more foreign key references locally on each of the hosts, the partitioning module being constrained to provide partitions which have data that allow for local computability of queries, and the sub-relations constrained to always output correct answers in response to a query, and the partitioning module being operable to keep a plurality of partitions of the sub-relations balanced across the plurality of hosts via one or more of;

    (i) keeping track of the foreign keys for each of the plurality of sub-relations that are placed on each of the plurality of partitions using a bit-map per referenced relation;

    (ii) keeping a score on the number of tuples of the sub-relations that have been placed in each of the plurality of partitions along with a count of the number of distinct dimension keys per relation in each of the plurality of partitions;

    (iii) looking up the bit-maps to determine if there are one or more candidate partitions which have all the dimension keys that occur in a tuple;

    (iv) inserting the tuple into the partition with the least score of the bit-maps if there are one or more candidate partitions or into the one with the least count if there is no candidate partition; and

    (v) updating the bit-maps to reflect the placement of the dimension keys in that partition as are the score and count; and

    a query executing module operable to;

    receive a query and identify at query time whether the query to the dataset is computable over the plurality of sub-relations locally on each of the plurality of hosts independent of the rest of the plurality of hosts, and in which a given received query is ad hoc and unpredictable, the query executing module being operable to identify local computability of the query at query time by;

    (i) constructing a characterization graph of the query whose vertices correspond to relation variables in the query for each of the plurality of databases; and

    (ii) determining if there exists a vertex in the characterization graph such that every other vertex in the characteristic graph is reachable from it by following directed edges;

    execute the query over the plurality of partitioned sub-relations stored on the plurality of hosts without moving data among the hosts or causing network bottlenecks; and

    generate an exact result for each query by merging local query output from the plurality of hosts.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×