Access path optimization using degrees of clustering
First Claim
1. A method for operating a data processing system having a processing unit and a physical storage device in which a plurality of pages of data are stored in a physical sequence wherein each page other than a last page has a sequential page which follows it, the physical storage device being operably connected to the processing unit for transferring pages of data to the processing unit, the physical storage device containing a data base table which is stored on a plurality of pages, the data base table having a plurality of rows of data and a plurality of indexes composed of a sequence of entries which reference the rows, the method comprising the steps performed by the processing unit of:
- (a) obtaining a data base operation command specifying search criteria;
(b) selecting a plurality of indexes which correspond to the search criteria;
(c) determining a degree of clustering for each selected index which is calculated by dividing the number of clustered rows referenced in the selected index by the number of rows in the data base table, clustered rows being those rows which are stored in the physical storage device in the same page or a sequential page of the physical storage device as the row referenced by the preceding entry in the index, divided by the total number of rows in the table, wherein the degree of clustering is directly proportional to the number of rows referenced by the selected index which are stored on a page of the physical storage device which is the same page on which the row referenced by the preceding index entry is stored;
(d) determining, using the degree of clustering for each selected index and the search criteria, a total expected time required for transferring the pages from the physical storage device if the selected index is used to specify the order in which to transfer the pages from the physical storage device; and
(e) transferring the pages containing the rows of the data base table from the physical storage device to the processing unit in the sequence specified in the selected index having the lowest total expected time from step (d), thereby obtaining the data matching the search criteria in an efficient manner.
2 Assignments
0 Petitions
Accused Products
Abstract
This invention measures the degree of clustering of an index for a relational data base table, estimates the number of physical page accesses required to access the table using a partial index scan using the index, and selects the index providing the fastest access path to the table. The degree of clustering is measured as follows:
DC=Number of clustered rows (NCR)/Total rows (NR)
A multiplier greater than 1 can be applied to the degree of clustering to reflect the benefit of having consecutively accessed rows on adjacent or nearby data pages.
The degree of clustering so calculated is used to estimate the number of random and sequential page accesses required for a partial index scan. These numbers of accesses are then multiplied by the unit time required for each, and added to the total CPU processing time required to arrive at the estimated total time for the scan. The total time is calculated for each index which could be used as an access path for the query or other operation being optimized, and the index with the shortest overall time is selected as the access path.
-
Citations
11 Claims
-
1. A method for operating a data processing system having a processing unit and a physical storage device in which a plurality of pages of data are stored in a physical sequence wherein each page other than a last page has a sequential page which follows it, the physical storage device being operably connected to the processing unit for transferring pages of data to the processing unit, the physical storage device containing a data base table which is stored on a plurality of pages, the data base table having a plurality of rows of data and a plurality of indexes composed of a sequence of entries which reference the rows, the method comprising the steps performed by the processing unit of:
-
(a) obtaining a data base operation command specifying search criteria; (b) selecting a plurality of indexes which correspond to the search criteria; (c) determining a degree of clustering for each selected index which is calculated by dividing the number of clustered rows referenced in the selected index by the number of rows in the data base table, clustered rows being those rows which are stored in the physical storage device in the same page or a sequential page of the physical storage device as the row referenced by the preceding entry in the index, divided by the total number of rows in the table, wherein the degree of clustering is directly proportional to the number of rows referenced by the selected index which are stored on a page of the physical storage device which is the same page on which the row referenced by the preceding index entry is stored; (d) determining, using the degree of clustering for each selected index and the search criteria, a total expected time required for transferring the pages from the physical storage device if the selected index is used to specify the order in which to transfer the pages from the physical storage device; and (e) transferring the pages containing the rows of the data base table from the physical storage device to the processing unit in the sequence specified in the selected index having the lowest total expected time from step (d), thereby obtaining the data matching the search criteria in an efficient manner. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for operating a data processing system having a processing unit and a physical storage device in which a plurality of pages of data are stored in a physical sequence wherein each page other than a last page has a next sequential page which follows it, the physical storage device being operably connected to the processing unit for transferring pages of data to the processing unit and containing a data base table which is stored on a plurality of pages, the data base table having a plurality of rows of data and a plurality of indexes composed of a sequence of entries which reference the rows, the method comprising the steps performed by the processing unit of:
-
(a) selecting a plurality of indexes referenced in a specified set of search criteria; (b) determining a degree of clustering for each selected index, the degree of clustering being equal to the number of rows referenced by the selected index which when considered in index order are stored on the same page or the next sequential page of the physical storage device as the row referenced by the preceding entry in the index, divided by the total number of rows in the table; (c) estimating, for each selected index, using the search criteria, a number of random page transfers expected if the selected index is used to determine the order in which to transfer the pages from the physical storage device, the estimating step including the use of a formula including the computation term
space="preserve" listing-type="equation">NR ×
(1-DC)where NR is the total number of rows in the data base table and DC is the degree of clustering; (d) estimating, for each selected index, using the search criteria, a number of sequential page transfers expected if the selected index is used to determine the order in which to transfer the pages from the physical storage device, the estimating step including the use of a formula including the computational term
space="preserve" listing-type="equation">NP ×
DCwhere NP is the total number of pages of data in the data base table and DC is the degree of clustering; (e) calculating a total expected transfer time for each selected index, using the product of the estimated number of random page transfers and a time value which estimates the time required to perform one random page transfer and the product of the estimated number of sequential page transfers and a time value which estimates the time required to perform one sequential page transfer; and (f) transferring the pages containing the rows of the data base table from the physical storage device to the processing unit in the sequence specified in the selected index having the lowest total expected transfer time, thereby obtaining the data matching the search criteria. - View Dependent Claims (8, 9, 10, 11)
-
Specification