Method for detecting problematic disk drives and disk channels in a RAID memory system based on command processing latency
First Claim
1. ) A method for detecting problematic disk storage devices in an array of independent disk storage devices, comprising the steps of:
- broadcasting a command set substantially simultaneously to a plurality of independent disk storage devices under study in the array thereof, acquiring a latency count of executing said command set by each of said plurality of independent disk storage devices, and identifying a respective one of said plurality of independent disk storage devices as a problematic disk storage device if said latency count thereof exceeds a predetermined latency value.
5 Assignments
0 Petitions
Accused Products
Abstract
In order to detect problematic drives in random arrays of independent disks, the system measures the latency of executing command sets which are broadcast to all disks in the data storage system and the results are compared to identify which disks take substantially longer to complete the requests. Disks that take longer to complete requests are likely to be problematic and are candidates for further examination and replacement. The disks in each tier group are compared to determine if any disk in that group exhibits problems. Also, counters for each tier group are compared to determine if the problem is with the disk or with the channel of the tier group. The latency of each disk in the tier group is saved in a table to provide a histogram of the latency of the disks in the tier group. Histograms of the disks in a single tier group are compared to determine if a specific disk is problematic. Histograms of each tier group are compared to determine if a specific disk is problematic or all the disks on the same channel exhibit problems.
-
Citations
16 Claims
-
1. ) A method for detecting problematic disk storage devices in an array of independent disk storage devices, comprising the steps of:
broadcasting a command set substantially simultaneously to a plurality of independent disk storage devices under study in the array thereof, acquiring a latency count of executing said command set by each of said plurality of independent disk storage devices, and identifying a respective one of said plurality of independent disk storage devices as a problematic disk storage device if said latency count thereof exceeds a predetermined latency value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
16. ) An array of independent disk storage devices with enhanced capability of problematic disk storage devices detection, comprising:
-
a plurality of independent disk storage devices distributed in at least a first tier group and a second tier group thereof, corresponding disk storage devices of said at least first and second tier group being coupled to a respective one of plurality of disk channels of said array; a processor unit issuing a command set and broadcasting said command set to said plurality of independent disk storage devices simultaneously through said plurality of disk channels; a counter unit coupled to said processor unit and calculating a latency count of executing said command set by each of said plurality of independent disk storage devices and a cumulative latency count of executing said command set by each of said at least first and second tier groups; a first latency table built by said processor unit and reflecting said latency counts for said each disk storage device; and a second latency table built by said processor unit and reflecting said cumulative latency count for each of said at least first and second tier groups; wherein said processor unit analyzes said first latency table and identifies said each disk storage device as a problematic disk storage device if said latency count thereof exceeds a predetermined latency value, and wherein said processor unit analyzes said second latency table and identifies said respective disk channel as a problematic one if said corresponding disk storage devices of said at least first and second tier groups exhibit said latency count exceeding said predetermined latency value.
-
Specification