Method and system for predicting causes of network service outages using time domain correlation
First Claim
1. A method for analyzing a potential cause of a change in a service, wherein service quality of the service is monitored, usage of the service is measured, and service events are detected, the method comprising:
- determining a service change time window based at least in part upon a change in service quality between a first working state and a second, non-working state, and upon a change in service usage amount, the service change time window encompassing at least part of a service outage;
retrieving data representing a plurality of detected events and a-corresponding times in which the events occurred;
computing a probability for each of the detected events that each of the detected events caused the service change based at least in part on a correlation between the event time and the service change time window;
determining whether one or more other events of a type identical to one of the detected events occurred;
and wherein computing the probability comprises computing the probability using at least in part a false occurrence weighting function which decreases the probability of the detected event as the cause of the service change for instances in which the detected event occurred outside the service change time window.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems are described for predicting the likely causes of service outages using only time information, and for predicting and the likely costs of service outages. The likely causes are found by defining a narrow likely cause window around an outage based on service quality and/or service usage data, and correlating service events to the likely cause window in the time domain to find a probability distribution for the events. The likely costs are found by measuring usage loss and duration for a given point during an outage and using cost component functions of the time and usage to extrapolate over the outage. These cause and cost predictions supply service administrators with tools for making more informed decisions about allocation of resources in preventing and correcting service outages.
207 Citations
33 Claims
-
1. A method for analyzing a potential cause of a change in a service, wherein service quality of the service is monitored, usage of the service is measured, and service events are detected, the method comprising:
-
determining a service change time window based at least in part upon a change in service quality between a first working state and a second, non-working state, and upon a change in service usage amount, the service change time window encompassing at least part of a service outage; retrieving data representing a plurality of detected events and a-corresponding times in which the events occurred; computing a probability for each of the detected events that each of the detected events caused the service change based at least in part on a correlation between the event time and the service change time window; determining whether one or more other events of a type identical to one of the detected events occurred; and wherein computing the probability comprises computing the probability using at least in part a false occurrence weighting function which decreases the probability of the detected event as the cause of the service change for instances in which the detected event occurred outside the service change time window. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method for analyzing potential causes of a service change, the method comprising:
- determining a service change time window encompassing a change of service between a first working state and a service outage, the service change being determined at least in part based on measured service usage levels;
detecting occurrences of a set of events;
retrieving data representing the plurality of detected events and corresponding times in which the events occurred, wherein the set of events are within a given time prior to and during the service change time window, each occurrence of an event being associated with a time at which the event occurred;
computing a probability distribution for the set of events, which probability distribution determines for each event in the set the probability that the detected event caused the service change, the probability distribution being based at least in part on relations between the time of each event occurrence and the service change time window; and
wherein computing the probability includes using two or more second functions selected from the group consisting of;
a time weighting function which decreases exponentially the probability of a given event as the cause of the service change with the distance between the given event time and the service chance time window;
a false occurrence weighting function which decreases the probability of a given event as the cause of the service change for instances in which events of the same type as the given event occurred outside the service change time window;
a positive occurrence weighting function which increases the probability of a given event as the cause of the service change based on instances stored in a historical database in which events of the same type as the given event occurred within a prior service change time window; and
a historical weighting function which increases the probability of a given event as the cause of the service change based on instances in the historical database in which events of the same type as the given event were identified as having caused a prior service outage. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- determining a service change time window encompassing a change of service between a first working state and a service outage, the service change being determined at least in part based on measured service usage levels;
-
25. A network monitoring system comprising:
- a service monitor for monitoring quality of service on the network;
a usage meter for measuring usage of the network;
an event detector for detecting a plurality of network events and corresponding times at which the network events occur; anda probable cause engine, coupled to receive data from the service monitor, usage meter, and the event detector, the probable cause engine including a processing device that, in response to executable instructions, is operative to;
set setting a service change time window based upon data received from the service monitor or usage meter, the service change time window encompassing at least part of an occurrence of a service outage in the network;
determine determining which of the network events detected by the event detector is the most likely cause of a service change including computing a probability for each of the detected events that each of the detected events caused the service change based at least in part on a correlation between the event time and service change time window;
determining whether one or more other events of a type identical to one of the detected events occurred; and
wherein computing the probability comprises computing the probability using at least in part a false occurrence weighting function which decreases the probability of the detected event as the cause of the service change for instances in which the detected event occurred outside the service change time window.
- a service monitor for monitoring quality of service on the network;
-
26. A computer readable medium storing program code for, when executed, causing a computer to perform a method for analyzing a potential cause of a an change in a service, wherein service quality of the service is monitored, usage amount of the service is measured, and service events are detected, the method comprising:
- determining a service change time window based at least in part upon a change in service quality between a first working state and a second, non-working state, and upon a change in service usage amount, the service change time window encompassing at least part of a service outage;
retrieving data representing a plurality of detected events and a-corresponding times in which the events occurred;
computing a probability for each of the detected events that each of the detected events caused the service change based at least in part on a correlation between the event time and the service change time window;
determining whether one or more other events of a type identical to one of the detected events occurred; and
wherein computing the probability comprises computing the probability using at least in part a false occurrence weighting function which decreases the probability of the detected event as the cause of the service change for instances in which the detected event occurred outside the service change time window.
- determining a service change time window based at least in part upon a change in service quality between a first working state and a second, non-working state, and upon a change in service usage amount, the service change time window encompassing at least part of a service outage;
-
27. Computer readable media comprising program code that, when executed by a programmable microprocessor, causes the programmable microprocessor to execute a method for analyzing potential cause of a service change, the method comprising:
- determining a service change time window encompassing a change of service between a first working state and a service outage, the service change being determined at least in part based on measured service usage levels;
detecting occurrences of a set of events;
retrieving data representing the plurality of detected events and corresponding times in which the events occurred, wherein the set of events are within a given time prior to and during the service change time window, each occurrence of an event being associated with a time at which the event occurred;
computing a probability distribution for the set of events, which probability distribution determines for each event in the set the probability that the detected event caused the service change, the probability distribution being based at least in part on relations between the time of each event occurrence and the service change window;
wherein computing the probability includes using two or more second functions selected from the group consisting of;
a time weighting function which decreases the probability of a given event as the cause of the service change with the distance between the given event time and the service change time window;
a false occurrence weighting function which decreases the probability of a given event as the cause of the service change for instances in which events of the same type as the given event occurred outside the service change time window;
a positive occurrence weighting function which increases the probability of a given event as the cause of the service change based on instances stored in a historical database in which events of the same type as the given event occurred within a prior service change time window; and
a historical weighting function which increases the probability of a given event as the cause of the service change based on instances in the historical database in which events of the same type as the given event were identified as having caused a prior service outage. - View Dependent Claims (28, 29, 30, 31, 32, 33)
- determining a service change time window encompassing a change of service between a first working state and a service outage, the service change being determined at least in part based on measured service usage levels;
Specification