Index Maintenance in a Multi-Node Database
First Claim
1. A method for improving the efficiency of database query processing on a distributed database, comprising:
- receiving a query of the database, wherein the database includes a collection of data records subdivided into a plurality of database portions, wherein each of the plurality of database portions is stored on one of a plurality of compute nodes and wherein each compute node includes a respective partial index of the data records stored on the respective compute node, wherein the partial indexes are generated from an index of all the data records in the database, so that each partial index is limited to those data records on the respective compute node;
distributing the query to one or more compute nodes of the plurality of compute nodes for execution;
executing, by the one or more compute nodes, the query operation against the data records of the respective compute node using the respective partial index; and
during query execution, monitoring the use of the partial index stored on the first compute node in executing the database query.
4 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the invention enable a database spread over multiple nodes to allow each node to have different indexes over the data in tables, depending on how each node would benefit (or not benefit) from having the index(es). When a database table is spread across the nodes of a multi-node or distributed system, each node may maintain only the portion of the index relevant to that node, if doing so would improve the performance of query processing operations on that node. Further, the database may periodically redistributed across the compute nodes based on index performance. Doing so allows the database system to intelligently trade off between consuming space for the index on a node and the usefulness of having an index on that node.
-
Citations
22 Claims
-
1. A method for improving the efficiency of database query processing on a distributed database, comprising:
-
receiving a query of the database, wherein the database includes a collection of data records subdivided into a plurality of database portions, wherein each of the plurality of database portions is stored on one of a plurality of compute nodes and wherein each compute node includes a respective partial index of the data records stored on the respective compute node, wherein the partial indexes are generated from an index of all the data records in the database, so that each partial index is limited to those data records on the respective compute node; distributing the query to one or more compute nodes of the plurality of compute nodes for execution; executing, by the one or more compute nodes, the query operation against the data records of the respective compute node using the respective partial index; and during query execution, monitoring the use of the partial index stored on the first compute node in executing the database query. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-readable storage medium containing a program which, when executed, performs an operation for improving the efficiency of database query processing on a distributed database, comprising:
-
receiving a query of the database, wherein the database includes a collection of data records subdivided into a plurality of database portions, wherein each of the plurality of database portions is stored on one of a plurality of compute nodes and wherein each compute node includes a respective partial index of the data records stored on the respective compute node, wherein the partial indexes are generated from an index of all the data records in the database, so that each partial index is limited to those data records on the respective compute node; distributing the query to one or more compute nodes of the plurality of compute nodes for execution; executing, by the one or more compute nodes, the query operation against the data records of the respective compute node using the respective partial index; and during query execution, monitoring the use of the partial index stored on the first compute node in executing the database query. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A parallel computing system, comprising:
-
a plurality of compute nodes, each having at least a processor and a memory, wherein the memory on each node is configured to store a portion of an in-memory database; and a service node configured to improve the efficiency of database query processing on a distributed database by performing the steps of; receiving a query of the database, wherein the database includes a collection of data records subdivided into a plurality of database portions, wherein each of the plurality of database portions is stored on one of a plurality of compute nodes and wherein each compute node includes a respective partial index of the data records stored on the respective compute node, wherein the partial indexes are generated from an index of all the data records in the database, so that each partial index is limited to those data records on the respective compute node, distributing the query to one or more compute nodes of the plurality of compute nodes for execution, executing, by the one or more compute nodes, the query operation against the data records of the respective compute node using the respective partial index, and during query execution, monitoring the use of the partial index stored on the first compute node in executing the database query. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
Specification