APPARATUS FOR ELASTIC DATABASE PROCESSING WITH HETEROGENEOUS DATA
First Claim
1. A database management system comprising:
- a network interface, for receiving database queries from two or more client application processes as a network database service, the client application processes originating from two different users, the system providing a least one connection into the system for each such client application process;
a group of two or more operational nodes for executing the queries as database operations, each operational node implemented as a logical collection of software components that execute on one or more physical machines;
where the number of physical machines is not necessarily the same as the number of operational nodes;
with the operational nodes assigned as controller-nodes, compute-nodes or storage-nodes, and groups of controller-nodes forming controller nodegroups, and groups of compute-nodes forming compute nodegroups, and groups of storage nodes forming storage nodegroups;
the number of operational nodes, and their available assignment as compute-nodes or storage-nodes varying during execution of the queries;
each client connection being assigned to an associated compute nodegroup;
the queries also specifying one or more tables for an associated database operation, with each such table being assigned to a respective storage nodegroup;
the operational nodes further;
operating in parallel;
with the number of operational nodes executing a given query or queries changing during a given time interval by at least one of;
(a) changing the compute-nodegroup associated with a connection, or(b) adding or removing nodes from the compute nodegroup associated with a connection; and
distributing data from the tables among the nodes in the storage nodegroup to which the table is assigned based on a Distribution Method which may be either data dependent or data independent; and
at least one of the controller nodes further;
executing a Dynamic Query Planner (DQP) process that transforms queries received from the client into a query plan that includes an ordered series of steps that are executed in parallel on multiple operational nodes where possible, the query plan further stipulating, for each step, which compute node it must be performed on, which storage nodes it must access, and other steps that this step depends on.
2 Assignments
0 Petitions
Accused Products
Abstract
A database management system implemented in a cloud computing environment. Operational nodes are assigned as groups of controller-nodes, compute-nodes or storage-nodes. Queries specify one or more tables for an associated database operation, with each table being assigned to respective storage nodegroup(s). The number of nodes executing a given query may change, by (a) changing the compute-nodes associated with a connection, or (b) adding or removing nodes associated with a connection; and/or distributing data to a storage nodegroup based on a Distribution Method which may be either data dependent or data independent. A controller node further executes a Dynamic Query Planner (DQP) process that develops a query plan.
-
Citations
20 Claims
-
1. A database management system comprising:
-
a network interface, for receiving database queries from two or more client application processes as a network database service, the client application processes originating from two different users, the system providing a least one connection into the system for each such client application process; a group of two or more operational nodes for executing the queries as database operations, each operational node implemented as a logical collection of software components that execute on one or more physical machines; where the number of physical machines is not necessarily the same as the number of operational nodes; with the operational nodes assigned as controller-nodes, compute-nodes or storage-nodes, and groups of controller-nodes forming controller nodegroups, and groups of compute-nodes forming compute nodegroups, and groups of storage nodes forming storage nodegroups; the number of operational nodes, and their available assignment as compute-nodes or storage-nodes varying during execution of the queries; each client connection being assigned to an associated compute nodegroup; the queries also specifying one or more tables for an associated database operation, with each such table being assigned to a respective storage nodegroup; the operational nodes further; operating in parallel; with the number of operational nodes executing a given query or queries changing during a given time interval by at least one of; (a) changing the compute-nodegroup associated with a connection, or (b) adding or removing nodes from the compute nodegroup associated with a connection; and distributing data from the tables among the nodes in the storage nodegroup to which the table is assigned based on a Distribution Method which may be either data dependent or data independent; and at least one of the controller nodes further; executing a Dynamic Query Planner (DQP) process that transforms queries received from the client into a query plan that includes an ordered series of steps that are executed in parallel on multiple operational nodes where possible, the query plan further stipulating, for each step, which compute node it must be performed on, which storage nodes it must access, and other steps that this step depends on. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A database management system comprising:
-
a network interface, for receiving database queries from two or more client application processes as a network database service, the client application processes originating from two different users, the system providing a least one connection into the system for each such client application processes; a group of two or more operational nodes for executing the queries as database operations, each operational node implemented as a logical collection of software components that execute on one or more physical machines; where the number of physical machines is not necessarily the same as the number of operational nodes; with the operational nodes assigned as controller-nodes, compute-nodes or storage-nodes, and groups of controller-nodes forming controller nodegroups, and groups of compute-nodes forming compute nodegroups, and groups of storage nodes forming storage nodegroups; wherein the storage nodes further provide a Persistent Data Store as at least one of relational database, flat files, or non-relational databases; with one of the compute nodegroups assigned as a default compute nodegroup and one of the storage nodegroups defined as a default storage nodegroup; the number of operational nodes, and their available assignment as compute-nodes or storage-nodes varying during execution of the queries; each client connection being assigned to an associated compute nodegroup; the queries also specifying one or more tables for an associated database operation, with each such table being assigned to a respective storage nodegroup; the operational nodes further; operating in parallel; with the number of operational nodes executing a given query or queries changing during a given time interval by at least one of; (a) changing the compute-nodegroup associated with a connection, or (b) adding or removing nodes from the compute nodegroup associated with a connection; and distributing data from the tables among the nodes in the storage nodegroup to which the table is assigned based on a Distribution Method which may be either data dependent or data independent; and at least one of the controller nodes further; executing a Dynamic Query Planner (DQP) process that transforms queries received from the client into a query plan that includes an ordered series of steps that are executed in parallel on multiple operational nodes where possible, the query plan further stipulating, for each step, which compute node it must be performed on, which storage nodes it must access, and other steps that this step depends on; and wherein nodegroups are further assigned reference counters and table counters, with (a) a reference counter being incremented when a nodegroup is assigned as the default storage or default compute nodegroup; (b) a reference counter being incremented when a nodegroup is assigned as the compute nodegroup for a connection; (c) the reference counter being decremented when a nodegroup is no longer the default storage or default compute nodegroup; and (d) the reference counter being decremented when a nodegroup is no longer the compute nodegroup for a connection; and
(e) the table counter being incremented each time a table is associated with the nodegroup; and(f) the table counter being decremented each time a table is disassociated from a nodegroup, wherein the reference counter and table counter are initialized to an initial-value; and
wherein a nodegroups can be deleted when both its associated reference counter and table counter reach the initial value; andintermediate processing of queries is decoupled from activities that are closely tied to where the data is stored by; storing intermediate tables generated as part of a query plan on the compute-nodegroup associated with the connection; and storing persistent user data on a storage-nodegroup; and wherein client connection state is decoupled from the connection and provided to the query execution engine that executes each step in the query plan.
-
Specification