MECHANISM FOR CO-LOCATED DATA PLACEMENT IN A PARALLEL ELASTIC DATABASE MANAGEMENT SYSTEM
First Claim
Patent Images
1. A database management system comprising:
- a network interface, for receiving database queries from two or more client application processes as a network database service, the client application processes originating from two different users, the system providing a least one connection into the system for each such client application process;
a group of two or more operational nodes for executing the queries as database operations, each operational node implemented as a logical collection of software components that execute on one or more physical machines;
where the number of physical machines is not necessarily the same as the number of operational nodes;
with the operational nodes assigned as controller-nodes, compute-nodes or storage-nodes, and groups of controller-nodes forming controller nodegroups, and groups of compute-nodes forming compute nodegroups, and groups of storage nodes forming storage nodegroups;
the number of operational nodes, and their available assignment as compute-nodes or storage-nodes varying during execution of the queries;
each client connection being assigned to an associated compute nodegroup;
the queries also specifying one or more tables for an associated database operation, with each such table being assigned to a respective storage nodegroup;
the operational nodes further;
operating in parallel;
with the number of operational nodes executing a given query or queries changing during a given time interval by at least one of;
(a) changing the compute-nodegroup associated with a connection, or(b) adding or removing nodes from the compute nodegroup associated with a connection; and
distributing data from the tables among the nodes in a storage nodegroup according to a data dependent distribution method specified by a Distribution Vector (DV), the DV including a set of attributes of the table that determine at least where each row is stored.
2 Assignments
0 Petitions
Accused Products
Abstract
A database management system implemented in a cloud computing environment. Operational nodes are assigned as groups of controller-nodes, compute-nodes or storage-nodes. Assignments as compute-nodes or storage-nodes vary during execution of queries. Queries specify tables for an associated database operation, and respective storage nodegroup(s). The number of nodes executing a query may change by (a) changing a compute-nodegroup, or (b) adding or removing nodes from a compute nodegroup; and/or distributing data to the storage nodegroup based on a Distribution Method which may be specified by a Distribution Vector (DV) that determines at least where each row is stored.
26 Citations
19 Claims
-
1. A database management system comprising:
-
a network interface, for receiving database queries from two or more client application processes as a network database service, the client application processes originating from two different users, the system providing a least one connection into the system for each such client application process; a group of two or more operational nodes for executing the queries as database operations, each operational node implemented as a logical collection of software components that execute on one or more physical machines; where the number of physical machines is not necessarily the same as the number of operational nodes; with the operational nodes assigned as controller-nodes, compute-nodes or storage-nodes, and groups of controller-nodes forming controller nodegroups, and groups of compute-nodes forming compute nodegroups, and groups of storage nodes forming storage nodegroups; the number of operational nodes, and their available assignment as compute-nodes or storage-nodes varying during execution of the queries; each client connection being assigned to an associated compute nodegroup; the queries also specifying one or more tables for an associated database operation, with each such table being assigned to a respective storage nodegroup; the operational nodes further; operating in parallel; with the number of operational nodes executing a given query or queries changing during a given time interval by at least one of; (a) changing the compute-nodegroup associated with a connection, or (b) adding or removing nodes from the compute nodegroup associated with a connection; and distributing data from the tables among the nodes in a storage nodegroup according to a data dependent distribution method specified by a Distribution Vector (DV), the DV including a set of attributes of the table that determine at least where each row is stored. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A database management system comprising:
-
a network interface, for receiving database queries from two or more client application processes as a network database service, the client application processes originating from two different users, the system providing a least one connection into the system for each such client application process; a group of two or more operational nodes for executing the queries as database operations, each operational node implemented as a logical collection of software components that execute on one or more physical machines; where the number of physical machines is not necessarily the same as the number of operational nodes; with the operational nodes assigned as controller-nodes, compute-nodes or storage-nodes, and groups of controller-nodes forming controller nodegroups, and groups of compute-nodes forming compute nodegroups, and groups storage nodes forming storage nodegroups; the number of operational nodes, and their available assignment as compute-nodes or storage-nodes varying during execution of the queries; each client connection being assigned to an associated compute nodegroup; the queries also specifying one or more tables for an associated database operation, with each such table being assigned to a respective storage nodegroup; the operational nodes further; operating in parallel; with the number of operational nodes executing a given query or queries changing during a given time interval by at least one of; (a) changing the compute-nodegroup associated with a connection, or (b) adding or removing nodes from the compute nodegroup associated with a connection; and wherein data from the tables is distributed among the nodes in a storage nodegroup according to a data dependent distribution method specified by a Distribution Vector (DV), the DV including a set of attributes of the table that determine at least where each row is stored; and
further whereineach nodegroup is associated with one or more generations, such that when a nodegroup is initially created it has a first generation, and the generation consists of at least a generation number, a Distribution Map (DM), and an Allocation Strategy (AS), with the AS determining where to send a row of data, and executable based on the DV of the row, and any information in that generation, the Distribution Map (DM) being used to keep track of whether DVs have not been seen previously, and where when a change in the nodes belonging to a nodegroup, or a change in an AS occurs, a new generation is created for the nodegroup; and if a table is to be distributed according to an elastic data distribution, a new row of data is stored in a manner that ensures co-location by (a) performing an iterative search through all generations to determine an earliest generation where it cannot be determined for sure that the DV was not seen; (b) if such a generation can be found, the new row is dispatched according to the allocation strategy in that generation;
else(c) if such a generation cannot be found, it is determined that the DV was never seen before, and the new row is dispatched according to the AS in the current generation, and (d) the DM for the current generation is updated to reflect the first occurrence of the DV for the new row in the current generation.
-
Specification