Method for improving performance of large databases
First Claim
1. A method for analyzing disk performance in a database system of the type which includes (a) database program means for responding to external queries for data input and output from users of the database system, (b) a data storage system for storing data used by the database system, on a plurality of individual disk drives, and (c) file system means, responsive to database file I/O requests from the database program to store and retrieve database files, for sending file system I/O commands to the data storage system to store and retrieve the database files on the data storage system arranged as a plurality of logical disks, at least some of the logical disks being striped across a plurality of the individual disk drives, the method including the steps of:
- defining a series of successive sampling periods during which sample data reflecting the performance of the database system is to be taken;
monitoring the database program means to collect data representative of database file I/O requests that occur during each sampling period and storing the collected data as a plurality of first data samples, each first data sample including the time of the data sample, the name of the associated database file, and a plurality of parameters of different types representative of the speed of response of the file system means to the database file I/O request;
monitoring the file system means to collect data representative of file system I/O commands that occur during each sampling period and storing the collected data as a plurality of second data samples, each second data sample including the time of the data sample, the name of the associated logical disk, and a plurality of parameters of different types representative of the speed of response of the data storage system to the file system I/O request;
monitoring the data storage system to collect data representative of the speed of response of each individual disk drive as data is stored and retrieved during each sampling period and storing the collected data as a plurality of third data samples, each third data sample including the time of the data sample, the name of the associated disk drive, and a plurality of parameters of different types representative of the speed of response of the disk drive as data is stored on and retrieved from the disk drive;
providing a display representative of database performance as reflected by the stored statistics, including the steps of;
designating a display time interval;
selecting one I/O type from among a group comprising database files, logical disks, and disk drives;
selecting one of the types of stored parameters for the selected I/O type;
selecting, from the stored data samples, a subset of samples for the selected I/O type that fall within the designated interval, so that the selected subset of samples will include a plurality of samples for either database files, logical disks, or disk drives, depending on the I/O type selected;
grouping the selected data samples into a plurality of groups so that each group includes data samples having the same name;
for each of said groups, mathematically combining the selected parameter in each of the data samples of the group to provide a single statistical parameter associated with the name of the groups data samples; and
displaying the statistical parameters along with the associated name for each of the names present in the selected subset.
8 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a tool with which a DBA can analyze a VLDB at all levels of data storage, both logical and physical, to analyze performance problems. The invention is particularly applicable to systems in which database files are striped over multiple disk drives. In the present invention, storage of information is monitored at three levels: database files, file system files, and individual disk drives Parameters indicating the activity and performance level of each of the levels of storage are taken at selected intervals and stored. An analysis tool is provided that allows a DBA to select a time window during which the database performance is be assessed. The analysis tool has a map of the logical and physical arrangement of the database being monitored and allows a DBA to move from level to level of both logical and physical structures involved in storing the data while displaying I/O activity in an easily understood manner.
-
Citations
14 Claims
-
1. A method for analyzing disk performance in a database system of the type which includes (a) database program means for responding to external queries for data input and output from users of the database system, (b) a data storage system for storing data used by the database system, on a plurality of individual disk drives, and (c) file system means, responsive to database file I/O requests from the database program to store and retrieve database files, for sending file system I/O commands to the data storage system to store and retrieve the database files on the data storage system arranged as a plurality of logical disks, at least some of the logical disks being striped across a plurality of the individual disk drives, the method including the steps of:
-
defining a series of successive sampling periods during which sample data reflecting the performance of the database system is to be taken; monitoring the database program means to collect data representative of database file I/O requests that occur during each sampling period and storing the collected data as a plurality of first data samples, each first data sample including the time of the data sample, the name of the associated database file, and a plurality of parameters of different types representative of the speed of response of the file system means to the database file I/O request; monitoring the file system means to collect data representative of file system I/O commands that occur during each sampling period and storing the collected data as a plurality of second data samples, each second data sample including the time of the data sample, the name of the associated logical disk, and a plurality of parameters of different types representative of the speed of response of the data storage system to the file system I/O request; monitoring the data storage system to collect data representative of the speed of response of each individual disk drive as data is stored and retrieved during each sampling period and storing the collected data as a plurality of third data samples, each third data sample including the time of the data sample, the name of the associated disk drive, and a plurality of parameters of different types representative of the speed of response of the disk drive as data is stored on and retrieved from the disk drive; providing a display representative of database performance as reflected by the stored statistics, including the steps of; designating a display time interval; selecting one I/O type from among a group comprising database files, logical disks, and disk drives; selecting one of the types of stored parameters for the selected I/O type; selecting, from the stored data samples, a subset of samples for the selected I/O type that fall within the designated interval, so that the selected subset of samples will include a plurality of samples for either database files, logical disks, or disk drives, depending on the I/O type selected; grouping the selected data samples into a plurality of groups so that each group includes data samples having the same name; for each of said groups, mathematically combining the selected parameter in each of the data samples of the group to provide a single statistical parameter associated with the name of the groups data samples; and displaying the statistical parameters along with the associated name for each of the names present in the selected subset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method for speeding up disk performance by improving placement in a database system of the type which includes (a) database program means for responding to external queries for data input and output from users of the database system, (b) a data storage system for storing data used by the database system, on a plurality of individual disk drives, and (c) file system means, responsive to database file I/O requests from the database program to store and retrieve database files, for sending file system I/O commands to the data storage system to store and retrieve the database files on the data storage system arranged as a plurality of logical disks, at least some of the logical disks being striped across a plurality of the individual disk drives, the method including the steps of:
-
defining a series of successive sampling periods during which sample data reflecting the performance of the database system is to be taken; monitoring the file system means to collect data representative of file system I/O commands that occur during each sampling period and storing the collected data as a plurality of first data samples, each second data sample including the time of the data sample, the name of the associated logical disk, and a plurality of parameters of different types representative of the speed of response of the data storage system to the file system I/O request; monitoring the data storage system to collect data representative of the speed of response of each individual disk drive as data is stored and retrieved during each sampling period and storing the collected data as a plurality of second data samples, each data sample including the time of the data sample, the name of the associated disk drive, and a plurality of parameters of different types representative of the speed of response of the disk drive as data is stored on and retrieved from the disk drive; providing a schema data store which includes (1) the name of each stripe set and the names of all disk drives that are associated with that stripe set, and (2) the name of each logical drive and the names of all disk drives that are associated with that logical drive; designating a time interval during which the database performance is to be analyzed for improvement; selecting one of the types of parameters stored in each third data set; selecting, from the stored data samples, a subset that includes all first data samples that fall within the designated interval, so that the selected subset of samples will include a plurality of samples for disk drives that occurred in the designated interval; grouping the selected data samples into a plurality of first groups so that each group includes data samples associated with stripe a named set; for each of first said groups, mathematically combining the selected parameter in each of the data samples of the group to provide a set of first statistical parameters representative of I/O activity associated with each stripe set; determining the busiest stripe set which is the stripe set with the highest I/O activity, as reflected by the statistical parameters; determining if the selected stripe set includes more than one logical drive, and if not, recommending that more individual disk drives be used to store the selected stripeset, otherwise performing the following steps; determining the logical disks on which the selected stripe set is stored; selecting, from the stored data samples, a second subset that includes all second data samples that fall within the designated interval and that are associated with each logical disk that is part of the selected stripe set, so that the second subset of samples will include a plurality of samples for logical disks that compose the selected stripe set and which occurred in the designated interval; grouping the second subset of selected data samples into a plurality of second groups so that each group includes data samples associated with a named logical disk; for each of said second groups, mathematically combining the selected parameter in each of the data samples of the group to provide a set of second statistical parameters representative of I/O activity associated with each named logical disk; determining second busiest logical disk which is the logical disk with the second highest I/O activity, as reflected by said first statistical parameters; determining a least busy stripe set which is the stripe set with the lowest I/O activity, as reflected by the statistical parameters; determining if there is enough room on the disk drives on which the least busy stripe set is stored to move the second busiest logical disk from the busiest stripe set to the least busy stripe set, and if so, recommending that such move be made.
-
Specification