Routing transactions in the presence of failing servers
First Claim
1. A method for routing transactions in a transaction processing system in which servers can fail comprising the steps of:
- estimating a probability of a transaction failure for each server and each transaction class;
estimating an arrival rate of each transaction class at each router and at each server;
determining a queue length of transactions of each class waiting at each server;
determining failure probabilities for each server in the transaction processing system based on the estimates of a probability of a transaction failure and the arrival rate for each transaction class and a number of transactions of each class waiting at the server;
routing transactions to servers that have failure rates below a predetermined threshold; and
if multiple servers satisfy the predetermined threshold, then choosing a server using statistical data about transaction arrival rates, response times, failure probabilities, and queue lengths of transactions per class at the servers.
1 Assignment
0 Petitions
Accused Products
Abstract
Failures are detected in servers of a transaction processing system, and transactions are routed to less failure prone servers in the system. Servers in the transaction processing system which are faulty for some transaction classes but good for others are detected, and such servers are used in a judicious manner to maximize the throughput and minimize the response time of the system. Error prone servers are occasionally probed to determine if they have improved in terms of their error characteristics. The mechanism implemented consists of three elements. The first is the selection of a routing algorithm based on the state of the transaction processing system. Second, transactions are used to probe systems considered too faulty for use in order to determine if they have improved in terms of their failure characteristics. Finally, soft ABENDs are detected. The algorithm for transaction routing to detect and control the problem of failing servers in a transaction processing system consists of two parts: The first part routes transactions to servers based on the length of the server queues, the response time of the transactions (i.e., queuing delay plus processing delay), and the perceived failure rate. The second part of the algorithm ensures that error prone servers are not completely ignored. Occasional transactions are used to probe servers in order to determine if they have improved in terms of their error characteristics.
-
Citations
9 Claims
-
1. A method for routing transactions in a transaction processing system in which servers can fail comprising the steps of:
-
estimating a probability of a transaction failure for each server and each transaction class; estimating an arrival rate of each transaction class at each router and at each server; determining a queue length of transactions of each class waiting at each server; determining failure probabilities for each server in the transaction processing system based on the estimates of a probability of a transaction failure and the arrival rate for each transaction class and a number of transactions of each class waiting at the server; routing transactions to servers that have failure rates below a predetermined threshold; and if multiple servers satisfy the predetermined threshold, then choosing a server using statistical data about transaction arrival rates, response times, failure probabilities, and queue lengths of transactions per class at the servers.
-
-
2. A method for routing transactions in a transaction processing system in which servers can fail comprising the steps of:
-
estimating a probability of a transaction failure for each server and each transaction class; estimating an arrival rate of each transaction class at each router and at each server; determining a queue length of transactions of each class waiting at each server; determining failure probabilities for each server in the transaction processing system based on the estimates of a probability of a transaction failure and the arrival rate for each transaction class and a number of transactions of each class waiting at the server; routing transactions to servers that have failure rates below a predetermined threshold; and detecting a decrease in a failure rate of servers that are ignored in the routing step because they were considered too faulty. - View Dependent Claims (3, 4, 5, 6)
-
-
7. A method for routing transactions in a transaction processing system in which servers can fail comprising the steps of:
-
estimating a probability of a transaction failure for each server and each transaction class; estimating an arrival rate of each transaction class at each router and at each server; determining a queue length of transactions of each class waiting at each server; determining failure probabilities for each server in the transaction processing system based on the estimates of a probability of a transaction failure and the arrival rate for each transaction class and a number of transactions of each class wailing at the server; routing transactions to servers that have failure rates below a predetermined threshold; and monitoring H(i,j), the number of class i transactions at server j.
-
-
8. Method for routing transactions in a transaction processing system in which servers can fail and wherein transactions of a certain class are not currently being routed to one or more servers because of a high failure rate, comprising the steps of:
-
estimating a probability of a transaction failure for each server and each transaction class; estimating an arrival rate of each transaction class at each router and at each server; determining a queue length of transactions of each class waiting at each server; determining failure probabilities for each server in the transaction processing system based on the estimates of a probability of a transaction failure and the arrival rate for each transaction class and a number of transactions of each class waiting at the server; routing transactions to servers that have failure rates below a predetermined threshold; sending a limited number of transactions to said one or more servers; and determining if a failure rate of said one or more servers has decreased based on a return of a code indicating a success or failure of transactions sent to the systems. - View Dependent Claims (9)
-
Specification