System and method for automatic identification of bottlenecks in a network
First Claim
1. A method for identifying bottleneck resources in a network having a plurality of resources, comprising the steps of:
- specifying a plurality of probe transactions in the network;
generating resource dependency information representing dependent resources for each of the probe transactions;
executing the probe transactions;
measuring end-to-end quality of service data resulting from the executing of the probe transactions; and
processing the resource dependency information and measured end-to-end quality of service data to identify bottleneck resources in the network.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for providing automated bottleneck identification in networks and networked application environments by processing using end-to-end quality of service measurements in combination with knowledge of internal resource dependency information generated by a network administrator. In one aspect, a method for identifying bottleneck resources in a network having a plurality of resources comprises the steps of: specifying a plurality of probe transactions in the network; generating resource dependency information representing dependent resources for each of the probe transactions; executing the probe transactions; measuring end-to-end quality of service data resulting from the executing of the probe transactions; and processing the resource dependency information and measured end-to-end quality of service data to identify bottleneck resources in the network. The system and method eliminate the need for obtaining and processing detailed measurements and/or estimates of parameters or data at the per resource level to identify bottlenecks.
122 Citations
32 Claims
-
1. A method for identifying bottleneck resources in a network having a plurality of resources, comprising the steps of:
-
specifying a plurality of probe transactions in the network;
generating resource dependency information representing dependent resources for each of the probe transactions;
executing the probe transactions;
measuring end-to-end quality of service data resulting from the executing of the probe transactions; and
processing the resource dependency information and measured end-to-end quality of service data to identify bottleneck resources in the network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
allocating a probe station at desired nodes in the network; and
configuring the allocated probe station at each node to execute at least one probe test for initiating execution of a service function in a remote server in the network.
-
-
3. The method of claim 2, wherein the resource dependency information for each of the probe transactions comprises one of (1) each important entity in a path between the probing station and the remote server, (2) the service function initiated in the remote server, (3) each function that is related to the service function initiated in the remote server, (4) any additional server that provides the related function, (5) each important entity in a path between the remote server and the any additional server, and a combination thereof.
-
4. The method of claim 1, wherein the step of generating resource dependency information for the probe transactions, comprises the steps of:
-
selecting resources in the network that are considered as potential bottlenecks; and
building a resource dependency model representing the dependency of each of the probe transactions on the selected resources.
-
-
5. The method of claim 4, wherein the resource dependency model comprises a dependency matrix D having a plurality of rows i, each row representing one of the selected resources, and a plurality of columns j, each column representing one of the probe transactions, wherein a matrix element D[i,j] is assigned a value representing the dependency of a given probe transaction j on a given resource i.
-
6. The method of claim 1, wherein the step of executing the probe transactions comprises the steps of:
-
specifying at least one probing rate for each of the probe transaction; and
performing the probe transactions at periodic time intervals based on the at least one probing rate.
-
-
7. The method of claim 6, further comprising the step of increasing the at least one probing rate for any probe transaction that has dependent resources identified as bottleneck resources.
-
8. The method of claim 1, wherein the step of processing comprises the steps of:
-
classifying the measured end-to-end quality of service data of each probe transaction as one of normal and abnormal;
collecting votes on the dependent resources from each of the probe transactions based on the classification of the measured end-to-end quality of service data;
computing a trouble index for each resource using the collected votes from the probe transactions; and
considering the trouble index to determine if a dependent resource of the probe transactions is a bottleneck.
-
-
9. The method of claim 8, wherein the step of classifying the measured end-to-end quality of service data of each probe transaction comprises the steps of:
-
selecting at least one threshold value representing an acceptable baseline service level of the end-to-end quality of service data for the probe transactions;
comparing the measured end-to-end quality of service data with the at least one threshold value; and
classifying the measured end-to-end quality of service data as abnormal if its value exceeds the at least one threshold value and normal otherwise.
-
-
10. The method of claim 9, further comprising the steps of:
-
specifying at least one probing rate for each of the probe transactions;
performing each of the probe transactions at periodic time intervals based on the at least one probing rate;
specifying a current evaluation window as comprising a predefined number of periodic time intervals prior to any given time t; and
computing a moving-average of the measured end-to-end quality of service data at a given time t using measured end to end quality of service data in the evaluation window, wherein the computed moving-average is used in the comparing step.
-
-
11. The method of claim 8, wherein the step of collecting votes on the dependent resources comprises the steps of:
-
according a positive vote to each dependent resource of each probe transaction if the corresponding measured end-to-end quality of service data is classified as normal; and
according a negative vote to each dependent resource of each probe transaction if the corresponding measured end-to-end quality of service data is classified as normal.
-
-
12. The method of claim 11, wherein the step of computing a trouble index for-each of the dependent resources of each probe transaction using the collected votes comprises the steps of;
-
specifying a weighting constant for each dependent resource of each probe transaction representing a relative strength of the probe transaction as evidence of the dependent resource as a bottleneck;
computing a total positive evidence value for each dependent resource as a weighted sum of all the positive votes of the dependent resource using the corresponding specified weighting constant; and
computing a total negative evidence value for each dependent resource as a weighted sum of all the negative votes of the dependent resource using the corresponding specified weighting constant;
wherein the trouble index for a dependent resource is computed as the total negative evidence value divided by the sum of the total positive evidence value and total negative evidence value.
-
-
13. The method of claim 8, wherein the step of considering the trouble index to determine if a resource is a bottleneck comprises the steps of:
-
specifying at least one tolerance level representing a total strength of evidence to identify a resource as a bottleneck candidate;
comparing the at least one tolerance level with the computed trouble index of each dependent resource; and
identifying as a bottleneck candidate, each dependent resource having a trouble index that exceeds the at least one tolerance level.
-
-
14. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for identifying bottleneck resources in a network having a plurality of resources, the method steps comprising:
-
generating resource dependency information representing dependent resources for each one of a plurality of specified probe transactions in a network;
executing the probe transactions;
measuring end-to-end quality of service data resulting from the executing of the probe transactions; and
processing the resource dependency information and measured end-to-end quality of service data to identify bottleneck resources in the network. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23)
selecting resources in the network that are considered as potential bottlenecks; and
building a resource dependency model representing the dependency of each of the probe transactions on the selected resources.
-
-
16. The program storage device of claim 15, wherein the resource dependency model comprises a dependency matrix D having a plurality of rows i, each row representing one of the selected resources, and a plurality of columns j, each column representing one of the probe transactions, wherein a matrix element D[i,j] is assigned a value representing the dependency of a given probe transaction j on a given resource i.
-
17. The program storage device of claim 14, wherein the instructions for performing the step of processing comprise instructions for performing the steps of:
-
classifying the measured end-to-end quality of service data of each probe transaction as one of normal and abnormal;
collecting votes on the dependent resources from each of the probe transactions based on the classification of the measured end-to-end quality of service data;
computing a trouble index for each resource using the collected votes from the probe transactions; and
considering the trouble index to determine if a dependent resource of the probe transactions is a bottleneck.
-
-
18. The program storage device of claim 17, wherein the instructions for performing the step of classifying the measured end-to-end quality of service data of each probe transaction comprise instructions for performing the steps of:
-
comparing the measured end-to-end quality of service data with at least one threshold value representing an acceptable baseline service level of the end-to-end quality of service data for the probe transactions; and
classifying the measured end-to-end quality of service data as abnormal if its value exceeds the at least one threshold value and normal otherwise.
-
-
19. The program storage device of claim 17, further comprising instructions for performing the steps of:
-
specifying at least one probing rate for each of the probe transactions;
performing each of the probe transactions at periodic time intervals based on the at least one probing rate;
specifying a current evaluation window as comprising a predefined number of periodic time intervals prior to any given time t; and
computing a moving-average of the measured end-to-end quality of service data at a given time t using measured end to end quality of service data in the evaluation window, wherein the computed moving-average is used in the comparing step.
-
-
20. The program storage device of claim 17, wherein the instructions for performing the step of collecting votes on the dependent resources comprise instructions for performing the steps of:
-
according a positive vote to each dependent resource of each probe transaction if the corresponding measured end-to-end quality of service data is classified as normal; and
according a negative vote to each dependent resource of each probe transaction if the corresponding measured end-to-end quality of service data is classified as normal.
-
-
21. The program storage device of claim 20, wherein the instructions for performing the step of computing a trouble index for each of the dependent resources of each probe transaction using the collected votes comprise instructions for performing the steps of:
-
specifying a weighting constant for each dependent resource of each probe transaction representing a relative strength of the probe transaction as evidence of the dependent resource as a bottleneck;
computing a total positive evidence value for each dependent resource as a weighted sum of all the positive votes of the dependent resource using the corresponding specified weighting constant; and
computing a total negative evidence value for each dependent resource as a weighted sum of all the negative votes of the dependent resource using the corresponding specified weighting constant;
wherein the trouble index for a dependent resource is computed as the total negative evidence value divided by the sum of the total positive evidence value and total negative evidence value.
-
-
22. The program storage device of claim 17, wherein the instructions for performing the step of considering the trouble index to determine if a resource is a bottleneck comprise instructions for performing the steps of:
-
specifying at least one tolerance level representing a total strength of evidence to identify a resource as a bottleneck candidate;
comparing the at least one tolerance level with the computed trouble index of each dependent resource; and
identifying as a bottleneck candidate, each dependent resource having a trouble index that exceeds the at least one tolerance level.
-
-
23. The program storage device of claim 14, further comprising instructions for generating a report of the identified bottleneck resources by one of displaying a snapshot of a bottleneck status of each resource in the network at a given time grouped by the type of resource, displaying a trend view of a bottleneck status of each resource in the network over adjacent time periods, and listing average of the measured end-to-end quality of service data for each identified bottleneck resource and a rationale for identifying the resource as a bottleneck.
-
24. A system for identifying bottleneck resources in a network having a plurality of resources, comprising:
-
a first database for storing measured end-to-end quality of service data resulting from the execution of a plurality of probe transactions in the network;
a resource dependency model generator for specifying resources in the network that are considered potential bottleneck resources and generating a resource dependency model to represent resource dependency information for each of the probe transactions based on the specified resources;
a second database for storing the resource dependency model; and
a triangulation engine for processing the resource dependency information and measured end-to-end quality of service data for the plurality of probe transactions to identify bottleneck resources in the network. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32)
a classification module for defining at least one threshold value representing an acceptable baseline service level of the end-to-end quality of service data for the probe transactions; and
a decision parameter specification module for specifying decision parameters comprising one of (1) at least one probing rate for each of the probe transactions for performing the probe transactions at periodic time intervals based on the at least one probing rate;
(2) a current evaluation window comprising a predefined number of periodic time intervals prior to any given evaluation time t;
(3) a weighting constant for each dependent resource of each probe transaction representing a relative strength of the probe transaction as evidence of the dependent resource as a bottleneck;
(4) at least one tolerance level representing a total strength of evidence to identify a resource as a bottleneck candidate; and
a combination thereof, wherein the at least one threshold value and the decision parameters are stored in the second database.
-
-
28. The system of claim 27, wherein the triangulation engine comprises:
-
means for classifying the measured end-to-end quality of service data of each probe transaction stored in the first database as one of normal and abnormal;
means for collecting votes on the dependent resources from each of the probe transactions based on the classification of the measured end-to-end quality of service data;
means for computing a trouble index for each resource using the collected votes from the probe transactions; and
means for considering the trouble index to determine if a dependent resource of the probe transactions is a bottleneck.
-
-
29. The system of claim 28, wherein triangulation engine further comprises means for computing a moving-average of the measured end-to-end quality of service data stored in the first database at a given time t using measured end-to-end quality of service data in the specified evaluation window, and wherein the means for classifying the measured end-to-end quality of service data of each probe transaction comprises means for comparing the measured end-to-end quality of service data with the at least one threshold value specified for the end-to-end quality of service data for the probe transactions, wherein the measured end-to-end quality of service data is classified as abnormal if its value exceeds the at least one threshold value and classified as normal otherwise.
-
30. The system of claim 28, wherein the means for collecting votes on the dependent resources comprises means for according a positive vote to each dependent resource of each probe transaction if the corresponding measured end-to-end quality of service data is classified as normal, and for according a negative vote to each dependent resource of each probe transaction if the corresponding measured end-to-end quality of service data is classified as normal.
-
31. The system of claim 28, wherein the means for computing a trouble index for each of the dependent resources of each probe transaction using the collected votes comprises means for computing a total positive evidence value for each dependent resource as a weighted sum of all the positive votes of the dependent resource using the corresponding specified weighting constant, and for computing a total negative evidence value for each dependent resource as a weighted sum of all the negative votes of the dependent resource using the corresponding specified weighting constant;
wherein the trouble index for a dependent resource is computed as the total negative evidence value divided by the sum of the total positive evidence value and total negative evidence value.
-
32. The system of claim 28, wherein the means for considering the trouble index to determine if a resource is a bottleneck comprises:
-
means for comparing the at least one specified tolerance level with the computed trouble index of each dependent resource; and
means for identifying as a bottleneck candidate, each dependent resource having a trouble index that exceeds the at least one tolerance level.
-
Specification