Root cause detection and monitoring for storage systems
First Claim
1. A system comprising:
- a host computing device configured to host one or more virtual computing device instances, the host computing device configured to transmit storage commands generated by the one or more virtual computing device instances via a communications network, the one or more virtual computing device instances executing on behalf of a client computing device;
a storage processing service, executed on one or more storage computing devices, the storage processing service configured to;
obtain a storage command request from a virtual computing device instance of the one or more virtual computing device instances;
process the storage command request to generate a storage command processing result associated with at least one storage volume, the storage volume associated with a storage computing device of the one or more storage computing devices;
collect storage command metric information based at least in part on the storage command processing result; and
a storage monitoring service, executed on one or more computing devices configured to;
obtain, from the storage processing service, the collected storage command metric information;
identify a relationship, among one or more storage volumes, based on the collected metric information, wherein at least one storage volume of the one or more storage volumes is executing on one of the one or more storage computing devices, wherein identifying the relationship further comprises identifying a first storage system event that affects multiple storage volumes;
determine that the identified relationship is different than a pre-determined relationship of the one or more storage volumes, wherein the pre-determined relationship corresponds to an association of storage system events for at least one of the one or more storage volumes, wherein the pre-determined relationship is associated with a metric threshold;
responsive to determining that the identified relationship is different than the pre-determined relationship, generate a new metric threshold for association with the identified relationship, the new metric threshold different from the metric threshold, wherein the new metric threshold comprises a number of active storage volumes and a time of usage associated with the first storage system event; and
make accessible the new metric threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
A storage system includes a monitoring service that identifies root causes of storage systems issues using relationships. The monitoring service can use thresholds associated with the relationships to detect the root causes. Relationships can be based on correlation relationships between the different levels of the storage system. In various embodiments, relationships can also be based on events that affect multiple storage volumes or on short-term events. Once a relationship is identified, a threshold for that relationship is generated or updated. The monitoring service can make that threshold accessible to other components of the monitoring service or an operator of the storage system to be used in detecting root causes.
-
Citations
17 Claims
-
1. A system comprising:
-
a host computing device configured to host one or more virtual computing device instances, the host computing device configured to transmit storage commands generated by the one or more virtual computing device instances via a communications network, the one or more virtual computing device instances executing on behalf of a client computing device; a storage processing service, executed on one or more storage computing devices, the storage processing service configured to; obtain a storage command request from a virtual computing device instance of the one or more virtual computing device instances; process the storage command request to generate a storage command processing result associated with at least one storage volume, the storage volume associated with a storage computing device of the one or more storage computing devices; collect storage command metric information based at least in part on the storage command processing result; and a storage monitoring service, executed on one or more computing devices configured to; obtain, from the storage processing service, the collected storage command metric information; identify a relationship, among one or more storage volumes, based on the collected metric information, wherein at least one storage volume of the one or more storage volumes is executing on one of the one or more storage computing devices, wherein identifying the relationship further comprises identifying a first storage system event that affects multiple storage volumes; determine that the identified relationship is different than a pre-determined relationship of the one or more storage volumes, wherein the pre-determined relationship corresponds to an association of storage system events for at least one of the one or more storage volumes, wherein the pre-determined relationship is associated with a metric threshold; responsive to determining that the identified relationship is different than the pre-determined relationship, generate a new metric threshold for association with the identified relationship, the new metric threshold different from the metric threshold, wherein the new metric threshold comprises a number of active storage volumes and a time of usage associated with the first storage system event; and make accessible the new metric threshold. - View Dependent Claims (2, 3)
-
-
4. A computer-implemented method for generating metric thresholds for relationships associated with storage volumes in a storage system comprising:
-
obtaining storage command metric information from a storage service, the metric information based at least in part on a storage command request from a virtual computing device instance hosted on a host computing device, wherein the virtual computing device instance is executing the storage command request; identifying a relationship among a plurality of storage volumes based on the obtained storage command metric information, wherein identifying the relationship further comprises identifying a first storage system event that affects multiple storage volumes; determining that the identified relationship is different than a pre-determined relationship of the plurality of storage volumes, wherein the pre-determined relationship corresponds to an association of storage system events for at least one storage volume of the plurality of storage volumes, wherein the pre-determined relationship is associated with a metric threshold; responsive to determining that the identified relationship is different than the pre-determined relationship, generating a new metric threshold for association with the identified relationship, the new metric threshold different from the metric threshold, wherein the new metric threshold comprises a number of active storage volumes and a time of usage associated with the first storage system event; and make accessible the new metric threshold. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage medium including computer-executable instructions comprising:
-
computer-executable instructions that, when executed by a computing device associated with one or more client computing devices; obtain metric information that is based at least in part on a storage command request from a virtual computing device instance hosted on a host computing device, wherein the virtual computing device instance is executing the storage command request, wherein the storage command request is executed on a plurality of storage volumes having a defined relationship; identify, with the obtained metric information, an additional relationship among a plurality of storage volumes, wherein identifying the additional relationship further comprises identifying a first storage system event that affects multiple storage volumes; determine that the identified relationship is different than the defined relationship of the plurality of storage volumes, wherein the defined relationship corresponds to a correlation of at least one storage volume of the plurality of storage volumes with a logical storage component comprising a portion of the plurality of storage volumes; and responsive to determining that the identified relationship is different than the defined relationship, generate a new metric threshold for association with the identified relationship, the new metric threshold different from the metric threshold, wherein the new metric threshold comprises a number of active storage volumes and a time of usage associated with the first storage system event; and make accessible the new metric threshold. - View Dependent Claims (14, 15, 16, 17)
-
Specification