Increasing resilience of a network service

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
7Forward
Citations 
0
Petitions 
1
Assignment
First Claim
1. A method comprising:
 obtaining a set of data representing a graph of a computer network having a set of hardware nodes and a set of hardware links between the hardware nodes, the hardware links being represented as edges in the graph;
finding a first subset of the set of hardware nodes, such that those of the hardware nodes in the first subset are able to withstand a maximum number of failures before the graph disconnects, the failures comprising at least one of node failures and edge failures; and
ranking the hardware nodes in the first subset based on expected resiliency, to obtain a ranked list;
wherein said expected resiliency is computed via E[R_{m}(v)]=Σ
_{fε2}_{E }P(f)N(v, f), wherein;
E represents a set of edges among the first subset of hardware nodes f;
P(f) represents a probability of all edges associated with the first subset f failing together;
R_{m}(v) represents a resiliency measure of a service deployed at a given node v; and
N(v, f) represents the number of nodes that can be reached from the given node v if all edges in the first subset f fail together; and
wherein said ranking comprises;
identifying edge and vertex independent paths from each hardware node in the first subset to all other hardware nodes;
weighting each of the edge and vertex independent paths with the estimated failure probability of the vertex and the edges in each independent path; and
ranking the hardware nodes in the first subset based on the weighted edge and vertex independent paths to represent expected resiliency of each hardware node by determining the number of edge and vertex independent paths derived from each hardware node and the estimated probability that each of said paths will fail.
1 Assignment
0 Petitions
Accused Products
Abstract
A set of data is obtained, representing a graph of a computer network having a set of hardware nodes and a set of hardware links between the hardware nodes. The hardware links are represented as edges in the graph. A first subset (for example, a vertex cut set) of the set of hardware nodes is found, such that those of the hardware nodes in the first subset are able to withstand a maximum number of failures before the graph disconnects. The failures include node failures and/or edge failures. The hardware nodes in the first subset are ranked based on expected resiliency, to obtain a ranked list. Optionally, in case of a tie between two or more of the hardware nodes in the ranked list, the tie is broken using a sum of shortest path metric.
14 Citations
View as Search Results
PROACTIVE CONTROLLER FOR FAILURE RESILIENCY IN COMMUNICATION NETWORKS  
Patent #
US 20150195190A1
Filed 12/02/2014

Current Assignee
University Of Ontario Institute Of Technology

Sponsoring Entity
Alireza Izaddoost, Shahram Shah Heydari

Proactive controller for failure resiliency in communication networks  
Patent #
US 9,590,892 B2
Filed 12/02/2014

Current Assignee
University Of Ontario Institute Of Technology

Sponsoring Entity
University Of Ontario Institute Of Technology

Prioritizing resiliency tests of microservices  
Patent #
US 10,102,111 B2
Filed 08/05/2016

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Prioritizing resiliency tests of microservices  
Patent #
US 10,169,220 B2
Filed 08/05/2016

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Cache memory architecture and policies for accelerating graph algorithms  
Patent #
US 10,417,134 B2
Filed 02/23/2017

Current Assignee
Oracle International Corporation

Sponsoring Entity
Oracle International Corporation

Application resilience system and method thereof for applications deployed on platform  
Patent #
US 10,462,234 B2
Filed 01/14/2019

Current Assignee
Huawei Technologies Co. Ltd.

Sponsoring Entity
Huawei Technologies Co. Ltd.

Checkpointing using compute node health information  
Patent #
US 10,545,839 B2
Filed 12/22/2017

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

DepthFirst Search For Target Value Problems  
Patent #
US 20110004581A1
Filed 07/02/2009

Current Assignee
Palo Alto Research Center Inc.

Sponsoring Entity
Palo Alto Research Center Inc.

Location management of offpremise resources  
Patent #
US 7,836,056 B2
Filed 12/20/2006

Current Assignee
Microsoft Technology Licensing LLC

Sponsoring Entity
Microsoft Corporation

Dynamically configurable fault tolerance in autonomic computing with multiple service points  
Patent #
US 7,328,363 B2
Filed 07/12/2006

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

SYSTEM AND METHOD FOR RESILIENCY PLANNING  
Patent #
US 20080140495A1
Filed 12/12/2006

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Methods, systems, and computer program products for multipath shortestpathfirst computations and distancebased interface selection for VoIP traffic  
Patent #
US 20070053300A1
Filed 10/10/2006

Current Assignee
Genband US LLC

Sponsoring Entity
Genband Incorporated

Network switch failure restoration  
Patent #
US 6,331,905 B1
Filed 04/01/1999

Current Assignee
Trustees Of Columbia University In The City Of New York

Sponsoring Entity
Trustees Of Columbia University In The City Of New York

Network architecture and related methods for surviving denial of service attacks  
Patent #
US 20050165901A1
Filed 01/22/2004

Current Assignee
Provenance Asset Group LLC

Sponsoring Entity
AlcatelLucent USA Inc.

20 Claims
 1. A method comprising:
obtaining a set of data representing a graph of a computer network having a set of hardware nodes and a set of hardware links between the hardware nodes, the hardware links being represented as edges in the graph; finding a first subset of the set of hardware nodes, such that those of the hardware nodes in the first subset are able to withstand a maximum number of failures before the graph disconnects, the failures comprising at least one of node failures and edge failures; and ranking the hardware nodes in the first subset based on expected resiliency, to obtain a ranked list;
wherein said expected resiliency is computed via E[R_{m}(v)]=Σ
_{fε2}_{E }P(f)N(v, f), wherein;E represents a set of edges among the first subset of hardware nodes f; P(f) represents a probability of all edges associated with the first subset f failing together; R_{m}(v) represents a resiliency measure of a service deployed at a given node v; and N(v, f) represents the number of nodes that can be reached from the given node v if all edges in the first subset f fail together; and
wherein said ranking comprises;identifying edge and vertex independent paths from each hardware node in the first subset to all other hardware nodes; weighting each of the edge and vertex independent paths with the estimated failure probability of the vertex and the edges in each independent path; and ranking the hardware nodes in the first subset based on the weighted edge and vertex independent paths to represent expected resiliency of each hardware node by determining the number of edge and vertex independent paths derived from each hardware node and the estimated probability that each of said paths will fail.  View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
 12. A computer program product comprising a tangible nontransitory computer readable recordable storage medium including computer usable program code, the computer program product including:
computer usable program code for obtaining a set of data representing a graph of a computer network having a set of hardware nodes and a set of hardware links between the hardware nodes, the hardware links being represented as edges in the graph; computer usable program code for finding a first subset of the set of hardware nodes, such that those of the hardware nodes in the first subset are able to withstand a maximum number of failures before the graph disconnects, the failures comprising at least one of node failures and edge failures; and computer usable program code for ranking the hardware nodes in the first subset based on expected resiliency, to obtain a ranked list;
wherein said expected resiliency is computed via E[R_{m}(V)]=Σ
_{fε2}_{E }P(f)N(v, f), wherein;E represents a set of edges among the first subset of hardware nodes f; P(f) represents a probability of all edges associated with the first subset f failing together; R_{m}(v) represents a resiliency measure of a service deployed at a given node v; and N(v, f) represents the number of nodes that can be reached from the given node v if all edges in the first subset f fail together; and
wherein said ranking comprises;identifying edge and vertex independent paths from each hardware node in the first subset to all other hardware nodes; weighting each of the edge and vertex independent paths with the estimated failure probability of the vertex and the edges in each independent path; and ranking the hardware nodes in the first subset based on the weighted edge and vertex independent paths to represent expected resiliency of each hardware node by determining the number of edge and vertex independent paths derived from each hardware node and the estimated probability that each of said paths will fail.  View Dependent Claims (13, 14, 15)
 16. An apparatus comprising:
a memory; and at least one processor, coupled to the memory, and operative to; obtain a set of data representing a graph of a computer network having a set of hardware nodes and a set of hardware links between the hardware nodes, the hardware links being represented as edges in the graph; find a first subset of the set of hardware nodes, such that those of the hardware nodes in the first subset are able to withstand a maximum number of failures before the graph disconnects, the failures comprising at least one of node failures and edge failures; and rank the hardware nodes in the first subset based on expected resiliency, to obtain a ranked list;
wherein said expected resiliency is computed via E[R_{m}(v)]=Σ
_{fε2}_{E }P(f)N(v, f), wherein;E represents a set of edges among the first subset of hardware nodes f; P(f) represents a probability of all edges associated with the first subset f failing together; R_{m}(v) represents a resiliency measure of a service deployed at a given node v; and N(v, f) represents the number of nodes that can be reached from the given node v if all edges in the first subset f fail together; and
wherein said ranking comprises;identifying edge and vertex independent paths from each hardware node in the first subset to all other hardware nodes; weighting each of the edge and vertex independent paths with the estimated failure probability of the vertex and the edges in each independent path; and ranking the hardware nodes in the first subset based on the weighted edge and vertex independent paths to represent expected resiliency of each hardware node by determining the number of edge and vertex independent paths derived from each hardware node and the estimated probability that each of said paths will fail.  View Dependent Claims (17, 18, 19)
 20. An apparatus comprising:
means for obtaining a set of data representing a graph of a computer network having a set of hardware nodes and a set of hardware links between the hardware nodes, the hardware links being represented as edges in the graph; means for finding a first subset of the set of hardware nodes, such that those of the hardware nodes in the first subset are able to withstand a maximum number of failures before the graph disconnects, the failures comprising at least one of node failures and edge failures; and means for ranking the hardware nodes in the first subset based on expected resiliency, to obtain a ranked list;
wherein said expected resiliency is computed via E[R_{m}(v)]=Σ
_{fε2}_{E }P(f)N(v, f), wherein;E represents a set of edges among the first subset of hardware nodes f; P(f) represents a probability of all edges associated with the first subset f failing together; R_{m}(v) represents a resiliency measure of a service deployed at a given node v; and N(v, f) represents the number of nodes that can be reached from the given node v if all edges in the first subset f fail together; and
wherein said ranking comprises;identifying edge and vertex independent paths from each hardware node in the first subset to all other hardware nodes; weighting each of the edge and vertex independent paths with the estimated failure probability of the vertex and the edges in each independent path; and ranking the hardware nodes in the first subset based on the weighted edge and vertex independent paths to represent expected resiliency of each hardware node by determining the number of edge and vertex independent paths derived from each hardware node and the estimated probability that each of said paths will fail.
1 Specification
This invention was made with Government support under contract number W911NF0630001 awarded by the United States Army. The government has certain rights in this invention.
Embodiments of the invention relate to the electrical, electronic and computer arts, and, more particularly, to network services and the like.
Network services are typically employed in a networked computing environment. They may, for example, be installed on one or more servers or other network nodes. They may provide, for example, shared resources to client computers. Examples of network services include DNS (Domain Name System), DHCP (Dynamic Host Control Protocol), email, printing, network file sharing, authentication servers, directory services, monitoring services, and the like. It is desirable that network services be tolerant of faults in the network.
U.S. Pat. No. 7,328,363 discloses dynamically configurable fault tolerance in autonomic computing with multiple service points. In particular, a method is described for configuring a system having a plurality of processors to provide the system with at least one cluster of processors, where each cluster has one service point. A distance is computed from each processor to other processors in the system. A plurality of total distances is then computed, where each total distance is associated with one processor. A minimum total distance is determined from the plurality of total distances. One processor is assigned to be the service point; this processor is the processor having the minimum total distance associated therewith.
Principles of the invention provide techniques for increasing resilience of a network service. In one aspect, an exemplary method (which can be computerimplemented) includes the step of obtaining a set of data representing a graph of a computer network having a set of hardware nodes and a set of hardware links between the hardware nodes. The hardware links are represented as edges in the graph. An additional step includes finding a first subset (for example, a vertex cut set) of the set of hardware nodes, such that those of the hardware nodes in the first subset are able to withstand a maximum number of failures before the graph disconnects. The failures include at least one of node failures and edge failures. A still further step includes ranking the hardware nodes in the first subset based on expected resiliency, to obtain a ranked list. Optionally, in case of a tie between two or more of the hardware nodes in the ranked list, break the tie with a sum of shortest path metric.
One or more embodiments of the invention or elements thereof can be implemented in the form of a computer product including a tangible computer readable recordable storage medium with computer usable program code for performing the method steps indicated. Furthermore, one or more embodiments of the invention or elements thereof can be implemented in the form of an apparatus including a memory and at least one processor that is coupled to the memory and operative to perform exemplary method steps. Yet further, in another aspect, one or more embodiments of the invention or elements thereof can be implemented in the form of means for carrying out one or more of the method steps described herein; the means can include (i) hardware module(s), (ii) software module(s), or (iii) a combination of hardware and software modules; any of (i)(iii) implement the specific techniques set forth herein, and the software modules are stored in a tangible computerreadable recordable storage medium (or multiple such media).
One or more embodiments of the invention may offer one or more of the following technical benefits:
 rapid analysis of network topology to increase resiliency of a network service
 applicability to most or all network topologies
 easy integration with overall resiliency of enterprise information technology (IT) systems and/or overall organizational resiliency
 The approach is adaptive as it captures and uses the current stochastic distribution of network faults.
These and other features, aspects and advantages of the invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
Network nodes and links can fail with nonzero probability. A network service should be resilient to failures. Multiple failures can occur in a short interval. Resiliency through service replication may not be always feasible due to network resource constraints and increase in cost. One or more embodiments of the invention increase, and preferably maximize, resiliency; for example, by reducing and preferably minimizing the expected number of consumer nodes disconnected from the network service and the rate at which the disconnection occurs. Position of the serviceoffering node and its connectivity to consumer nodes can impact resiliency of the service. Good, and preferably optimal, positioning of a service can be employed to increase, and preferably maximize, resiliency.
In one or more embodiments, given a network graph, find an optimal node M to be used as a service provider, such that, given the stochastic distribution of network faults, selection of M maximizes expected resiliency of the service. The total “distance” from M to all other nodes must be minimum compared to all other nodes who are equally resilient as M. In particular, in one or more embodiments, find the vertex cutset to narrow down the candidate nodes which are more resilient than any other nodes. Use Menger'"'"'s theorem to calculate edge independent paths to all other nodes from the nodes in the vertex cutset. Each independent path is weighted with the marginal probability of failures along the path. Use a tournament method to choose the node with the maximum number of edge independent paths to most of the nodes.
One or more embodiments provide a method to maximize resiliency of a network service, including techniques to find the expected resiliency of a service deployed at a particular node in a network, as well as a nearoptimal polynomialtime method to select a node from a given network of nodes such that it maximizes the expected resiliency of a service deployed at that node.
A nonlimiting example of a situation in which one or more techniques of the invention might be of use is that of a distributed implementation of Tivoli Netcool® Precision (Tivoli Network Manager) software (registered mark of International Business machines Corporation, Armonk, N.Y., USA), which requires partitioning a network into multiple domains and monitoring each of these domains concurrently. A monitoring service should be resilient to node or link failures in a domain. Achieving resiliency by replicating the monitoring service may not be always feasible due to network constraints and increase in cost. Each domain typically has one monitor that monitors all the nodes in that domain. The cost of placing multiple monitors in a domain is very high and also increases the complexity of rootcause analysis. There is typically no control over selecting the set of nodes to be placed under one monitor, as the monitor can be placed only after the partitioning is decided.
With reference to
Expected Resiliency of a Service
Given a network graph G=(V, E) where V is the set of nodes (devices) and E is the set of edges (links) among the nodes, let:
f⊂E, P(f) be the probability of all edges in f to fail together;
R_{m }(v) be the resiliency of a service deployed at v; and
N (v, f) be the number of nodes that can be reached from v if all the edges in f fail together.
A vertex failure can always be represented as a set of edge failures that are incident on that vertex. The expected resiliency of a monitoring service with the monitor placed at v is given by:
With reference to plot 300 of
One or more embodiments of the invention address the issue of finding an optimal node M as a service provider, given a network graph, such that, given the stochastic distribution of network faults, selection of M maximizes expected resiliency of the service under all probable combinations of node and link failures. Also, the total distance from M to all other nodes must be minimum compared to all those nodes who are equally resilient as M. In one or more embodiments, employ a three step approach to address this issue. The following description is a simplified, intuitive version to facilitate understanding of the methodology for the person having ordinary skill in the art. Additional detail is provided below. Refer to
In one specific, nonlimiting implementation, find the nodes which can withstand the maximum number of node or link failures before the graph disconnects by finding the vertex cutset to narrow down the candidate nodes which are more resilient than any other nodes, as in step 406. Use Menger'"'"'s theorem in step 408 to calculate the edgeindependent path to all other nodes from the nodes in the vertex cutset. Each independent path is weighted with the marginal probability of failures along the path, in step 410. Use a tournament method, in step 412, to choose nodes that have higher weight compared to most of the other candidate nodes. If there are still multiple nodes with equal score, choose the one which has the minimum sum of network distance to other nodes as in step 452. Note that an exemplary tournament method is discussed below.
As noted, one nonlimiting application of one or more techniques in accordance with the invention is a network monitoring service, wherein, given a network graph G=(V, E), a monitor is placed on a node which polls all other nodes in the network periodically to obtain their status. Find a node v in V where the monitoring service can be deployed such that the expected resiliency of the monitoring service is a maximum. For the sake of simplicity, in one or more embodiments, assume that all nodes and links are equally likely to fail in the network, and that failure of the monitoring node will completely bring down the service. In some instances, there may be constraints such as only one monitor can be deployed in the network and that all nodes may not be suitable for installation of the monitoring service.
With reference to
With reference to
With reference to
Furthermore, consider a technique to compute the sum of the shortest distance to all nodes. Given a Graph G=(V, E), seek, for a node v in V, to find the sum of shortest distance to all other nodes from v. In one or more embodiments, for each node v, in (V−v), find the shortest hopdistance from v to v_{i}, and sum up the distances. For example, in
 Shortest distance between D1 and R2=1
 Shortest distance between D1 and R1=1
 Shortest distance between D1 and R3=3
 Shortest distance between D1 and D0=2
 Shortest distance between D1 and D2=2
 Shortest distance between D1 and D3=2
 Shortest distance between D1 and D4=2
 Sum of shortest distance=13
By way of review and provision of additional detail, one or more embodiments of the invention provide a method to maximize resiliency of a network service, including a nearoptimal polynomialtime method to select a node from a given network of nodes such that it maximizes the expected resiliency of a service deployed at that node. Aspects of the invention can be used, for example, to find a list of nodes ordered by the resiliency that can be expected if the service is rendered from any of them. The approach can be adopted for all network services deployed on any network topology. Advantageously, one or more embodiments capture and use the stochastic distribution of failures. Furthermore, in one or more instances, a “graceful degradation approach” is provided, wherein, even if the graph disconnects, the servicenode which loses connection slower than other nodes is chosen.
Yet further, aspects of the invention provide IT infrastructure resiliency information, which may be required to determine the expected overall enterprise resiliency of an organization, such as an enterprise or the like. One or more embodiments can be used to design resilient service delivery by finding maximally tolerant node(s); for example, by computing the expected structural resiliency of given serviceoffering nodes using network topology analysis. One or more embodiments aid in designing and/or evaluating existing servicereplication/and/or disaster recovery plans.
Certain theorems and concepts from graph theory are pertinent to one or more embodiments of the invention. With reference to graph 1300 of
 Cardinality of each X_{i }is the same
 No set of vertices exists with cardinality<X_{i}, removal of which disconnects the graph.
Menger'"'"'s theorem can be stated in an edge connectivity version and a vertex connectivity version. With regard to the former, Let G be a finite undirected graph and x and y two nonadjacent vertices. Then the theorem states that the size of the minimum edge cut for x and y is equal to the maximum number of pairwise edgeindependent paths from x to y. With regard to the latter, vertex connectivity version, let G be a finite undirected graph and x and y two nonadjacent vertices. Then the theorem states that the size of the minimum vertex cut for x and y is equal to the maximum number of pairwise vertexindependent paths from x to y.
B→A→D→E→F;
B→D→F;
B→C→H→G→F; and
B→H→F.
The EdgeCut=4.
The vertex independent paths from B to F include:
B→D→F; and
B→H→F.
The VertexCut=2. All vertex independent paths are edge independent but not viceversa.
With reference to graph 1500 of
Thus, one or more embodiments of the invention provide one or more of a system, method, and computer program product to increase, and preferably maximize, resilience of a network service. An optimal node M is found in a given network graph as a service provider such that the expected resiliency of the service should be maximized under all probable combination of node and link failures. The expected resiliency is computed using the failureprobability distribution of network elements and the number of independent paths between the service provider and the consumer nodes. Features and benefits of one or more embodiments of the invention include:
 A method to quickly analyze a network topology to increase or maximize resiliency of a network service.
 A nearoptimal polynomialtime method to select a node from a given network of nodes such that it maximizes the expected resiliency of a service deployed at that node.
 Applicability to find a list of nodes ordered by the resiliency that can be expected if the service is rendered from any of them.
 Adaptable for all network services deployed on any network topology.
 Capture and use of the stochastic distribution of failures.
 “A graceful degradation approach”—even if the graph disconnects choose the servicenode which loses connection slower than other nodes.
 A method or system to get the fault distribution of network nodes and links by mining the failure events.
 A method or system to find the potential candidate nodes to deploy a service in a given network topology by computing the vertex cutset of the network graph.
 A method which computes the vertex and edge independent paths to remaining nodes of the network topology from the potential candidate nodes.
 A method to compute the expected resiliency of a service for each potential candidate node based on the number of independent paths and the fault distribution of the network.
 A method or system to choose the set of candidate nodes to deploy a service by choosing those potential candidate nodes which maximize the expected resiliency of that service.
Calculating Resiliency Score
In one or more embodiments, calculate the resiliency score based on edge independent paths from each of the vertex cut nodes to all other nodes in the network graph. Let P be the set of all edgeindependent paths from a vertexcut node v to another node n. The resiliency score for v relative to n is equal to:
here Pr(failure(p)) denotes the probability of path p to fail.
Tournament Method to Rank Vertex Cut (VC) Nodes
In one or more embodiments, consider each vertexcut node as a player. A match between two players includes multiple sets. Each set denotes a node in the network (except the two VC nodes under consideration). Therefore, the number of sets equals the total number of nodes in the network—2 (the two players). A player wins a set if the player has a higher resiliency score than its competitor relative to a particular node. A player wins the match if the player has won a greater number of sets than its competitor. A match is played between each pair of VC nodes. The players (VC nodes) are ranked according to the number of matches they win in the tournament.
If there were only one node in the vertex cut set, no further processing would be necessary. However, in the general case, an additional step includes ranking the nodes in the first subset based on their expected resiliency. One nonlimiting example of a way to carry out the step of ranking the nodes in the first subset based on their expected resiliency includes carrying out steps 408, 410, 412, and 414, as will be discussed in greater detail below. Once the ranking is completed, any tie in the ranks is broken by looking at the sum of network distances from the tied nodes to all other nodes, as in step 452. The node with the least distance gets the higher ranking. Step 452 could be carried out, for example, with network distance based tiebreaker engine 1516.
In one or more embodiments, the primary goal is to create a ranked list (with as few ties as possible), instead of finding the most resilient node. However, if a choice is made to select the node which has the highest rank, then it is necessary to look at the nodes that are tied at the top (the second subset, after the vertex cut set is ranked; this is required only for this particular case). If there is only one node at the top, then processing is complete. However, if more than one node is tied at the top, then use the sum of network distance to break the tie. On the other hand, if the complete ranked list is required, then carry out the tiebreaking process for every set of nodes that are tied at some rank.
Optionally, the ranked list can be stored in a tangible computerreadable recordable storage medium and/or displayed to a human subject (for example, a network architect) on a display device (omitted from figure for brevity). Optionally, a network service is physically located on a preferred one of the hardware nodes from the ranked list by loading hardware processorexecutable program code, embodying the network service, onto a tangible computerreadable recordable storage medium associated with the preferred hardware node (that is, on the node per se or on a storage device connected to the node by a high reliability connection) (also omitted from figure for brevity).
In some cases, the step of ranking the nodes in the first subset based on their expected resiliency includes steps 408, 410, 412, and 414, with input from step 448. In step 408, use Menger'"'"'s theorem. Transform the graph and find the edge and vertex independent paths from each vertex cut set node to all other nodes, using, for example, edge independent path finder 1506. In step 448, preferably in parallel, estimate the failure probability of each edge and vertex using the fault database 1510 and network fault probability estimation engine 1512. Carry out step 410 based on steps 408 and 448, using path weighting engine 1508 with input from estimation engine 1512 and edge independent path finder 1506. Step 410 includes weighting each path using the estimated failure probability of the vertex and edges present in that path. In step 412, use the tournament method to rank the vertex cut set nodes based on the number of wins. Step 412 can be carried out, for example, using ranking engine 1514. The result is the ranked list of vertex cut set nodes 414. As discussed several times above, if the ranked list 414 includes one (or more, as the case may be) ties, as determined in decision block 450, break the tie by arranging the nodes in ascending order using the sum of shortest path metric, as in step 452. The tie breaking can be carried out with tiebreaker engine 1516.
As noted, one nonlimiting example of a network service is a networkmonitoring service for monitoring the network. Such a service can function by periodically polling all of the hardware nodes in the network, other than the preferred hardware node on which the monitoring service is located, to obtain status. The preferred hardware node is selected to maximize resiliency of the monitoring service to the node failures and the edge failures. The preferred hardware nodded may be selected, in some cases, based upon assuming, for simplicity, that all of the hardware nodes and all of the edges representing the hardware links are equally likely to fail.
In one or more embodiments, the ranked list is ranked based on expected resiliency to be obtained by locating a network service on each given hardware node in the ranked list. The expected resiliency can be calculated in accordance with equation (1) above.
Exemplary System and Article of Manufacture Details
A variety of techniques, utilizing dedicated hardware, general purpose processors, firmware, software, or a combination of the foregoing may be employed to implement the present invention or components thereof. One or more embodiments of the invention, or elements thereof, can be implemented in the form of a computer product including a computer usable medium with computer usable program code for performing the method steps indicated. Furthermore, one or more embodiments of the invention, or elements thereof, can be implemented in the form of an apparatus including a memory and at least one processor that is coupled to the memory and operative to perform exemplary method steps.
One or more embodiments can make use of software running on a general purpose computer or workstation. With reference to
Accordingly, computer software including instructions or code for performing the methodologies of the invention, as described herein, may be stored in one or more of the associated memory devices (for example, ROM, fixed or removable memory) and, when ready to be utilized, loaded in part or in whole (for example, into RAM) and implemented by a CPU. Such software could include, but is not limited to, firmware, resident software, microcode, and the like.
Furthermore, the invention can take the form of a computer program product accessible from a computerusable or computerreadable medium (for example, media 1618) providing program code for use by or in connection with a computer or any instruction implementation system. For the purposes of this description, a computer usable or computer readable medium can be any apparatus for use by or in connection with the instruction implementation system, apparatus, or device. The medium can store program code to execute one or more method steps set forth herein.
The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a tangible computerreadable recordable storage medium (as distinct from a propagation or transmission medium) include a semiconductor or solidstate memory (for example memory 1604), magnetic tape, a removable computer diskette (for example media 1618), a random access memory (RAM), a readonly memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact diskread only memory (CDROM), compact diskread/write (CDR/W) and DVD.
A data processing system suitable for storing and/or executing program code will include at least one processor 1602 coupled directly or indirectly to memory elements 1604 through a system bus 1610. The memory elements can include local memory employed during actual implementation of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during implementation.
Input/output or I/O devices (including but not limited to keyboards 1608, displays 1606, pointing devices, and the like) can be coupled to the system either directly (such as via bus 1610) or through intervening I/O controllers (omitted for clarity).
Network adapters such as network interface 1614 may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
As used herein, including the claims, a “server” includes a physical data processing system (for example, system 1612 as shown in
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user'"'"'s computer, partly on the user'"'"'s computer, as a standalone software package, partly on the user'"'"'s computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user'"'"'s computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Embodiments of the invention have been described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a tangible computerreadable recordable storage medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computerreadable medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be implemented substantially concurrently, or the blocks may sometimes be implemented in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardwarebased systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Furthermore, it should be noted that any of the methods described herein can include an additional step of providing a system comprising distinct software modules embodied on a tangible computer readable recordable storage medium; the modules can include, for example, any or all of the components shown in
In any case, it should be understood that the components illustrated herein may be implemented in various forms of hardware, software, or combinations thereof; for example, application specific integrated circuit(s) (ASICS), functional circuitry, one or more appropriately programmed general purpose digital computers with associated memory, and the like. Given the teachings of the invention provided herein, one of ordinary skill in the related art will be able to contemplate other implementations of the components of the invention.
It will be appreciated and should be understood that the exemplary embodiments of the invention described above can be implemented in a number of different fashions. Given the teachings of the invention provided herein, one of ordinary skill in the related art will be able to contemplate other implementations of the invention. Indeed, although illustrative embodiments of the present invention have been described herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various other changes and modifications may be made by one skilled in the art without departing from the scope or spirit of the invention.