Methods, systems, and media to correlate errors associated with a cluster
First Claim
1. A method for correlating error events of a cluster, the method comprising:
- identifying systems of the cluster potentially impacted by an error based upon a topology of the cluster, wherein a first system of the systems of the cluster includes two ports for communicating with other systems of the cluster, wherein the topology of the cluster comprises loop data describing a loop including the first system and a second system in the systems of the cluster via the two ports;
identifying an error event associated with the first system, from the error events; and
selecting the error event based upon error identification data associated with the error event, wherein selecting the error event comprises comparing the error identification data with the loop data to identify an error in the loop including the first system and the second system in the systems of the cluster, to report the error to a maintenance provider;
wherein selecting the error event comprises at least one of;
(i) identifying a redundant error event of the error events, the redundant error event having error identification data that describes the error, wherein the error event and the redundant error event are associated with the first system of the systems and the loop;
(ii) identifying a duplicate error event of the error events, the duplicate error event having error identification data that describes the error, wherein the error event is associated with a first system of the systems and a loop and a second error event is associated with a second system of the systems and the loop; and
(iii) identifying a symptomatic error event of the error events, the symptomatic error event having error identification data that describes a second error, wherein the second error results from the error.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems, and media for correlating error events of a cluster are disclosed. Embodiments may identify systems of a cluster potentially impacted by an error and identify one or more error events associated with those systems. Then, embodiments may select one of the identified error events based upon data associated with the identified error event, disregarding other identified error events generated for the same error or errors symptomatic of the error, to report the error to a maintenance provider via a single error event. Many embodiments may identify one or more error events potentially resulting from the same error by identifying error events within a specified time period of the event that triggered the correlation. Several embodiments correlate the error events in an environment that is substantially independent of the cluster. Further embodiments obtain data that describes system interconnections of the cluster and generate a topology based upon the data.
-
Citations
10 Claims
-
1. A method for correlating error events of a cluster, the method comprising:
-
identifying systems of the cluster potentially impacted by an error based upon a topology of the cluster, wherein a first system of the systems of the cluster includes two ports for communicating with other systems of the cluster, wherein the topology of the cluster comprises loop data describing a loop including the first system and a second system in the systems of the cluster via the two ports; identifying an error event associated with the first system, from the error events; and selecting the error event based upon error identification data associated with the error event, wherein selecting the error event comprises comparing the error identification data with the loop data to identify an error in the loop including the first system and the second system in the systems of the cluster, to report the error to a maintenance provider;
wherein selecting the error event comprises at least one of;(i) identifying a redundant error event of the error events, the redundant error event having error identification data that describes the error, wherein the error event and the redundant error event are associated with the first system of the systems and the loop; (ii) identifying a duplicate error event of the error events, the duplicate error event having error identification data that describes the error, wherein the error event is associated with a first system of the systems and a loop and a second error event is associated with a second system of the systems and the loop; and (iii) identifying a symptomatic error event of the error events, the symptomatic error event having error identification data that describes a second error, wherein the second error results from the error.
-
-
2. An apparatus for correlating error events of a cluster, the apparatus comprising:
-
a system identifier coupled with the cluster to identify systems of the cluster potentially impacted by an error based upon a topology of the cluster, wherein a first system of the systems of the cluster includes two ports for communicating with other systems of the cluster, wherein the topology of the cluster comprises loop data describing a loop including the first system and a second system in the systems of the cluster;
wherein the system identifier comprises a sibling identifier to identify the systems coupled with sibling loops of the cluster;an event identifier coupled with the system identifier to identify an error event associated with the first system, from the error events; and an event selector coupled with the event identifier to select the error event based upon error identification data associated with the error event, wherein selecting the error event comprises comparing the error identification data with the loop data to identify an error in the loop including the first system and the second system of the cluster, to report the error to a maintenance provider.
-
-
3. An apparatus for correlating error events of a cluster, the apparatus comprising:
-
a system identifier coupled with the cluster to identify systems of the cluster potentially impacted by an error based upon a topology of the cluster, wherein a first system of the systems of the cluster includes two ports for communicating with other systems of the cluster, wherein the topology of the cluster comprises loop data describing a loop including the first system and a second system in the systems of the cluster; an event identifier coupled with the system identifier to identify an error event associated with the first system, from the error events;
wherein the event identifier comprises a time correlator to identify error events received within a time period of receipt of the error; andan event selector coupled with the event identifier to select the error event based upon error identification data associated with the error event, wherein selecting the error event comprises comparing the error identification data with the loop data to identify an error in the loop including the first system and the second system of the cluster, to report the error to a maintenance provider.
-
-
4. An apparatus for correlating error events of a cluster, the apparatus comiprising:
-
a system identifier coupled with the cluster to identify systems of the cluster potentially impacted by an error based upon a topology of the cluster, wherein a first system of the systems of the cluster includes two ports for communicating with other systems of the cluster, wherein the topology of the cluster comprises loop data describing a loop including the first system and a second system in the systems of the cluster; an event identifier coupled with the system identifier to identify an error event associated with the first system, from the error events; and an event selector coupled with the event identifier to select the error event based upon error identification data associated with the error event, wherein selecting the error event comprises comparing the error identification data with the loop data to identify an error in the loop including the first system and the second system of the cluster, to report the error to a maintenance provider;
wherein the event selector comprises repetition circuitry to identify a second error event of the error events, wherein the second error event has error identification data that describes the error.
-
-
5. An apparatus for correlating error events of a cluster, the apparatus comprising:
-
a system identifier coupled with the cluster to identify systems of the cluster potentially impacted by an error based upon a topology of the cluster, wherein a first system of the systems of the cluster includes two ports for communicating with other systems of the cluster, wherein the topology of the cluster comprises loop data describing a loop including the first system and a second system in the systems of the cluster; an event identifier coupled with the system identifier to identify an error event associated with the first system, from the error events; and an event selector coupled with the event identifier to select the error event based upon error identification data associated with the error event, wherein selecting the error event comprises comparing the error identification data with the loop data to identify an error in the loop including the first system and the second system of the cluster, to report the error to a maintenance provider;
wherein the event selector comprises causation circuitry to identify a second error event of the error events, the second error event having error identification data that describes a symptom of the error.
-
-
6. An apparatus for correlating error events of a cluster, the apparatus comprising:
-
a system identifier coupled with the cluster to identify systems of the cluster potentially impacted by an error based upon a topology of the cluster, wherein a first system of the systems of the cluster includes two ports for communicating with other systems of the cluster, wherein the topology of the cluster comprises loop data describing a loop including the first system and a second system in the systems of the cluster; an event identifier coupled with the system identifier to identify an error event associated with the first system, from the error events; and an event selector coupled with the event identifier to select the error event based upon error identification data associated with the error event, wherein selecting the error event comprises comparing the error identification data with the loop data to identify an error in the loop, including the first system and the second system of the cluster, to report the error to a maintenance provider;
wherein the event selector comprises priority circuitry to identify a second error event of the error events having error identification data associated with a lower priority than a priority for the error identification data associated with the error event.
-
-
7. A computer readable storage medium containing a program which, when executed, performs an operation, comprising:
-
identifying systems of a cluster potentially impacted by an error based upon a togology of the cluster, wherein a first system of the cluster includes two ports for communicating with other systems of the cluster, wherein the topology of the cluster comprises loop data describing a loop including the first system and a second system in the systems of the cluster; identifying an error event associated with the first system, from error events generated by the cluster;
wherein identifying the error event comprises identifying error events received within a time period of receipt of the error event and error events that associates the loop with a source of the error, the loop being associated with the error event; andselecting the error event based upon error identification data associated with the error event, wherein selecting the error event comprises comparing the error identification data with the loop data to identify an error in the loop including the first system and the second system in the systems of the cluster, to report the error to a maintenance provider.
-
-
8. A computer readable storage medium containing a program which, when executed, performs an operation, comprising:
-
identifying systems of a cluster potentially impacted by an error based upon a topology of the cluster, wherein a first system of the cluster includes two ports for communicating with other systems of the cluster, wherein the topology of the cluster comprises loop data describing a loop including the first system and a second system in the systems of the cluster; identifying an error event associated with the first system, from error events generated by the cluster; and selecting the error event based upon error identification data associated with the error event, wherein selecting the error event comprises comparing the error identification data with the loop data to identify an error in the loop including the first system and the second system in the systems of the cluster, to report the error to a maintenance provider, wherein selecting the error event further comprises identifying a second error event of the error events, wherein the second error event has error identification data that describes the error.
-
-
9. A computer readable storage medium containing a program which, when executed, performs an operation, comprising:
-
identifying systems of a cluster potentially impacted by an error based upon a topology of the cluster, wherein a first system of the cluster includes two ports for communicating with other systems of the cluster, wherein the topology of the cluster comprises loop data describing a loop including the first system and a second system in the systems of the cluster; identifying an error event associated with the first system, from error events generated by the cluster; and selecting the error event based upon error identification data associated with the error event, wherein selecting the error event comprises comparing the error identification data with the loop data to identify an error in the loop including the first system and the second system in the systems of the cluster, to report the error to a maintenance provider, wherein selecting the error event comprises identifying a second error event of the error events, the second error event having error identification data that describes a symptom of the error.
-
-
10. A computer readable storage medium containing a program which, when executed, performs an operation, comprising:
-
identifying systems of a cluster potentially impacted by an error based upon a topology of the cluster, wherein a first system of the cluster includes two ports for communicating with other systems of the cluster, wherein the topology of the cluster comprises loop data describing a loop including the first system and a second system in the systems of the cluster; identifying an error event associated with the first system, from error events generated by the cluster; and selecting the error event based upon error identification data associated with the error event, wherein selecting the error event comprises comparing the error identification data with the loop data to identify an error in the loop including the first system and the second system in the systems of the cluster, to report the error to a maintenance provider, wherein selecting the error event comprises identifying a second error event of the error events having error identification data associated with a lower priority than a priority for the error identification data associated with the error event.
-
Specification