Parallel index maintenance
First Claim
1. A method for coordinating an update to a global index of an indexed table, the method comprising the steps of:
- receiving index maintenance records from a plurality of data manipulation slaves for the indexed table, wherein each index maintenance record includes a value for an index key of the global index;
computing a plurality of ranges for values of the index key received in the index maintenance records;
assigning each range of the plurality of ranges to a respective index update slave of a plurality of index update slaves; and
distributing work associated with said update to said plurality of index update slaves based on the plurality of ranges assigned to said plurality of index update slaves.
2 Assignments
0 Petitions
Accused Products
Abstract
A method, system and product for coordinating a parallel update for a global index of an indexed table involves a coordinator process and slave processes. The coordinator process receives index maintenance records from data manipulation slaves for an indexed table. Each index maintenance record includes a value for an index key of a global index of the table. The coordinator process computes index key value ranges and sends each range to an index update slave. Each slave updates the global index using just the index maintenance records with key values in its respective range, thus avoiding contention among the slaves and increasing clustering so that scaleable parallelism may be more closely attained. Techniques are also described for deferring the maintenance of global indexes relative to the time when the table on which they are built is changed.
297 Citations
43 Claims
-
1. A method for coordinating an update to a global index of an indexed table, the method comprising the steps of:
-
receiving index maintenance records from a plurality of data manipulation slaves for the indexed table, wherein each index maintenance record includes a value for an index key of the global index;
computing a plurality of ranges for values of the index key received in the index maintenance records;
assigning each range of the plurality of ranges to a respective index update slave of a plurality of index update slaves; and
distributing work associated with said update to said plurality of index update slaves based on the plurality of ranges assigned to said plurality of index update slaves. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 36, 37, 38)
the index maintenance records cover a plurality of global indexes associated with said indexed table;
each index maintenance record includes an index identification that identifies a particular index of said plurality of global indexes; and
the step of computing the plurality of ranges is performed for each index of said plurality of global indexes.
-
-
3. The method of claim 1, wherein the step of computing the plurality of ranges further comprises:
-
sampling a set of the index maintenance records, wherein a number of records sampled is a sample size; and
determining said plurality of ranges based upon index key values contained in said set of the index maintenance records.
-
-
4. The method of claim 3, wherein the step of determining said plurality of ranges includes:
-
determining a number of index update slaves, wherein the number of index update slaves is an index update degree of parallelism;
sorting values of the index key in the set of the index maintenance records sampled; and
defining a range using an index key value at a specified position of the values of the index key after sorting, wherein the specified position is related to a multiple of the sample size divided by the index update degree of parallelism.
-
-
5. The method of claim 4, wherein, during the sampling, the sample size equals a total number of records received from said plurality of data manipulation slaves.
-
6. The method of claim 4, wherein, during the sampling, the sample size is a specified quantity less than a total number of records received from said plurality of data manipulation slaves.
-
7. The method of claim 4, wherein the specified position is also related to an affinity of the respective index update slave for leaf blocks in the global index, the leaf blocks containing the values of the index key in the range.
-
8. The method of claim 1, the distributing further comprising:
-
generating an index distribution table including a range and a slave identification for each range of the plurality of ranges; and
replicating the index distribution table for each index update slave of the plurality of index update slaves.
-
-
9. The method of claim 8, wherein the index distribution table includes an index identification associated with each range, wherein the index identification identifies which global index, of a plurality of global indexes built on said table, is associated with said range.
-
10. The method of claim 8, further comprising, before the replicating, revising the index update distribution table based on an affinity of a respective index update slave for leaf blocks in the global index, the leaf blocks containing values of the index key in the range.
-
11. The method of claim 1, further comprising, before receiving index maintenance records, distributing data manipulation operations among the plurality of data manipulation slaves.
-
12. The method of claim 11, wherein the data manipulation operations are distributed based on partitions formed using a key other than the index key of said global index.
-
13. The method of claim 11, wherein:
-
the step of distributing data manipulation operations is performed by distributing data manipulation operations to a first number of data manipulation slaves;
the step of distributing work associated with said update to said plurality of index update slaves is performed by distributing work to a second number of index update slaves; and
the first number does not equal the second number.
-
-
36. The method of claim 1, the distribution further comprising:
-
generating an index distribution table including a range and a slave identification for each range of the plurality of ranges; and
replicating the index distribution table for each data manipulation slave of the plurality of data manipulation slaves.
-
-
37. The method of claim 1, wherein:
-
the step of receiving index maintenance records further comprises inserting maintenance records into a buffer for the global index so the records are in order of increasing key value in the buffer; and
the step of computing the plurality of ranges further comprises determining said plurality of ranges based upon index key values contained in said buffer.
-
-
38. The method of claim 37, wherein the step of determining said plurality of ranges includes:
-
determining a number of index update slaves, wherein the number of index update slaves is an index update degree of parallelism;
determining a sample size equal to a number of index maintenance records in the buffer; and
defining a range using an index key value in a index maintenance record at a specified position of the buffer, wherein the specified position is related to a multiple of the sample size divided by the index update degree of parallelism.
-
-
14. A method for updating a global index of an indexed table comprising:
-
receiving data that identifies a value range, where the value range defines a range of values for an index key of the global index of the indexed table;
reading an index key value from a current index maintenance record; and
updating the global index using the current index maintenance record if the index key value falls within the value range. - View Dependent Claims (15, 16, 17, 18, 39)
receiving an index identification associated with the value range;
reading a current index identification from the current index maintenance record; and
based on the current index identification, identifying which global index of the plurality of global indexes to update using said current index maintenance record.
-
-
16. The method of claim 14 wherein:
-
a plurality of index update slaves is used to update said global index;
each index update slave of said plurality of index update slaves has a slave identification;
the steps of receiving data that identifies the value range, reading the index key value, and updating the global index, are performed by a particular index update slave of said plurality of index update slaves;
the particular index update slave has a particular slave identification;
the step of receiving data that identifies the value range includes receiving an index update distribution table from a coordinator process, wherein the index update distribution table maps value ranges for the index key of the global index to slave identifications;
the method further includes the particular index update slave performing the steps of;
locating the range in the index update distribution table that encompasses the index key value read from the current index maintenance record;
determining if a slave identification associated with the range located corresponds to the particular slave identification; and
if the slave identification corresponds to the particular slave identification, then updating the global index using the current index maintenance record.
-
-
17. The method of claim 16, further comprising, if the slave identification does not correspond to the particular slave identification, then sending the current index maintenance record to a slave corresponding to the slave identification.
-
18. The method of claim 16, wherein:
-
the indexed table has a plurality of global indexes;
the index update distribution table includes an index identification associated with each range;
the method further comprises reading a current index identification from the current index maintenance record; and
during said updating the global index, the particular index update slave determines which global index of the plurality of global indexes to update based on the current index identification.
-
-
39. The method of claim 14, wherein:
-
a plurality of index update slaves is used to update said global index;
each index update slave of said plurality of index update slaves has a slave identification;
the step of receiving data that identifies the value range includes receiving an index update distribution table from a coordinator process, wherein the index update distribution table maps value ranges for the index key of the global index to slave identifications; and
the method further comprises locating the range in the index update distribution table that encompasses the index key value read from the current index maintenance record, determining an slave identification associated with the range located, and sending the current index maintenance record to an index update slave corresponding to the slave identification associated with the range.
-
-
19. A method of parallelizing a data manipulation operation on an indexed table, the method comprising the steps of:
-
based on a first partitioning criteria, dividing said data manipulation operation into a first set of work granules;
distributing said first set of work granules to a first set of slaves;
based on a second partitioning criteria, dividing a task of changing one or more indexes built on said indexed table into a second set of work granules; and
distributing said second set of work granules to a second set of slaves;
wherein said task of changing said one or more indexes includes updating said one or more indexes to reflect changes made to said indexed table by said data manipulation operation; and
wherein said first partitioning criteria is different than said second partitioning criteria. - View Dependent Claims (20, 21, 22, 23, 24, 25)
-
-
26. A computer-readable medium bearing instructions arranged to cause one or more processors to perform:
-
receiving index maintenance records from a plurality of data manipulation slaves for an indexed table of a database, wherein each index maintenance record includes a value for an index key of a global index for the indexed table;
computing a plurality of ranges for values of the index key received in the index maintenance records;
assigning each range of the plurality of ranges to a respective index update slave of a plurality of index update slaves; and
distributing work associated with an update of said global index to said plurality of index update slaves based on the plurality of ranges assigned to said plurality of index update slaves. - View Dependent Claims (40, 41, 42)
generating an index distribution table including a range and a slave identification for each range of the plurality of ranges; and
replicating the index distribution table for each data manipulation slave of the plurality of data manipulation slaves.
-
-
41. The computer-readable medium of claim 26, wherein:
-
the step of receiving index maintenance records further comprises inserting maintenance records into a buffer for the global index so the records are in order of increasing key value in the buffer; and
the step of computing the plurality of ranges further comprises determining said plurality of ranges based upon index key values contained in said buffer.
-
-
42. The computer-readable medium of claim 41, wherein the step of determining said plurality of ranges includes:
-
determining a number of index update slaves, wherein the number of index update slaves is an index update degree of parallelism;
determining a sample size equal to a number of index maintenance records in the buffer; and
defining a range using an index key value in a index maintenance record at a specified position of the buffer, wherein the specified position is related to a multiple of the sample size divided by the index update degree of parallelism.
-
-
27. A computer-readable medium bearing instructions arranged to cause one or more processors to perform:
-
receiving data that identifies a value range, where the value range defines a range of values for an index key of a global index of an indexed table;
reading an index key value from a current index maintenance record; and
updating the global index using the current index maintenance record if the index key value falls within the value range. - View Dependent Claims (43)
a plurality of index update slaves is used to update said global index;
each index update slave of said plurality of index update slaves has a slave identification;
the step of receiving data that identifies the value range includes receiving an index update distribution table from a coordinator process, wherein the index update distribution table maps value ranges for the index key of the global index to slave identifications; and
the instructions are arranged to cause one or more processors to further perform locating the range in the index update distribution table that encompasses the index key value read from the current index maintenance record, determining an slave identification associated with the range located, and sending the current index maintenance record to an index update slave corresponding to the slave identification associated with the range.
-
-
28. A computer-readable medium bearing instructions arranged to cause one or more processors to perform:
-
based on a first partitioning criteria, dividing a data manipulation operation into a first set of work granules;
distributing said first set of work granules to a first set of slaves;
based on a second partitioning criteria, dividing a task of changing one or more indexes built on an indexed table into a second set of work granules; and
distributing said second set of work granules to a second set of slaves;
wherein said task of changing said one or more indexes includes updating said one or more indexes to reflect changes made to said indexed table by said data manipulation operation; and
wherein said first partitioning criteria is different than said second partitioning criteria.
-
-
29. A computer-readable medium bearing information for use by one or more processors that are participating in execution of a global index update, the information comprising:
-
an index maintenance record;
the index maintenance record including an index identification that identifies a global index requiring change; and
an index key value of the global index, the index key value indicating an index entry requiring change within the global index.
-
-
30. A computer-readable medium bearing information for use by one or more processors that are participating in execution of a global index update, the information comprising:
-
an index update distribution record;
wherein the index update distribution record includes data that identifies a range of values, wherein the range of values is for an index key of a global index of an indexed table; and
a slave identification, wherein said slave identification identifies a slave responsible for performing updates to said global index for index entries associated with key values that fall within said range of values. - View Dependent Claims (31)
-
-
32. A system for maintaining a global index of a database in response to parallel data manipulation operations, the system comprising:
-
a plurality of processing nodes;
a network connecting the plurality of nodes;
an indexed table having a global index based on an index key residing on at least one of the plurality of nodes;
a coordinator process running on a first node of the plurality of nodes;
a plurality of data manipulation slaves, each data manipulation slave on a respective node of the plurality of nodes; and
a plurality of index update slaves, each index update slave on a respective node of the plurality of nodes;
wherein, the coordinator process is configured to distribute data manipulation operations among the plurality of data manipulation slaves, to receive index maintenance records from the plurality of data manipulation slaves, each record including an index key value, to compute a plurality of ranges for values of the index key received in the index maintenance records, and to send each range of the plurality of ranges to a respective index update slave;
wherein each data manipulation slave is configured to perform a data manipulation operation on a row of the database and to send an index maintenance record to the coordinator process; and
wherein each index update slave is configured to receive a respective range of the plurality of ranges, to read an index key value from a current index maintenance record, and to update the global index using the current index maintenance record if the index key value falls within the range of values.
-
-
33. A method of using a global index of an indexed table comprising:
-
executing a query using an index key for the global index to provide a query result;
determining whether an out-of-date flag indicates the global index is not current;
if the out-of-date flag indicates the global index is not current, then searching through a set of sorted index maintenance records, each record containing a value of an index key for the global index and an index operation; and
modifying the query result according to an index maintenance record that includes an index key value that matches the index key value used in the query.
-
-
34. A computer-readable medium bearing instructions arranged to cause one or more processors to perform:
-
executing a query using an index key for a global index of an indexed table to provide a query result;
determining whether an out-of-date flag indicates the global index is not current;
if the out-of-date flag indicates the global index is not current, then searching through a set of sorted index maintenance records, each record containing a value of an index key for the global index and an index operation; and
modifying the query result according to an index maintenance record that includes an index key value that matches the index key values used in the query.
-
-
35. A computer-readable medium bearing instructions arranged to cause one or more processors to perform:
-
distributing data manipulation operations among a plurality of data manipulation slaves, receiving index maintenance records from the plurality of data manipulation slaves, each record including an index key value for a global index of an indexed table, computing a plurality of ranges for values of the index key received in the index maintenance records, and sending each range of the plurality of ranges to a respective index update slave among a plurality of index update slaves, by a coordinator process;
performing a data manipulation operation on a row of the database and sending an index maintenance record to the coordinator process, by a data manipulation slave; and
receiving a respective range of the plurality of ranges, reading an index key value from a current index maintenance record, and updating the global index using the current index maintenance record if the index key value falls within the range of values, by an index update slave.
-
Specification