SIP server architecture fault tolerance and failover
First Claim
1. A computer implemented method for providing failover and fault tolerance, comprising:
- maintaining a replica for storing state information;
maintaining an engine node that writes and reads state information to and from the replica and uses the state information to process messages;
periodically polling the replica by the engine node;
failing to poll by the engine node for a specified period of time; and
determining that the engine node has failed by the replica upon expiration of the specified period of time;
wherein the engine node periodically polls the replica for a set of expired timer objects that need to be processed by the engine node.
2 Assignments
0 Petitions
Accused Products
Abstract
The SIP server can be comprised of an engine tier and a state tier distributed on a cluster network. Engine nodes in the engine tier can process SIP messages and can read/write state information from/to the state tier. State tier can maintain state information in a set of partitions of one or more replicas which contain duplicate information. The engine nodes can be adapted to detect and report replica failures and the replicas can in turn be adapted to detect and report engine node failures. Replicas can detect faults with an engine node if the engine node fails to poll the replica for a specified period of time and can then report the failure. The engine node can detect failures of a replica when reading or writing state information and can report the failure to another replica, which can be responsible for updating the partition view to exclude dead replicas.
-
Citations
11 Claims
-
1. A computer implemented method for providing failover and fault tolerance, comprising:
-
maintaining a replica for storing state information; maintaining an engine node that writes and reads state information to and from the replica and uses the state information to process messages; periodically polling the replica by the engine node; failing to poll by the engine node for a specified period of time; and determining that the engine node has failed by the replica upon expiration of the specified period of time; wherein the engine node periodically polls the replica for a set of expired timer objects that need to be processed by the engine node. - View Dependent Claims (2, 3)
-
-
4. A system for providing failover and fault tolerance, comprising:
-
a replica connected to a cluster network and adapted to store state information used for processing messages; and an engine node connected to the cluster network and adapted to read and write the state information to and from the replica when processing the messages; wherein the replica is adapted to detect and report engine node failures in the cluster and the engine node is adapted to detect and report replica failures in the cluster; and wherein the engine node periodically polls the replica for a set of expired timer objects in order to check out the expired timer objects and to process the expired timer objects; and wherein the engine node periodically polls the replica for a set of expired timer objects in order to check out the expired timer objects and to process the expired timer objects.
-
-
5. A system for providing failover and fault tolerance, comprising:
-
a replica connected to a cluster network and adapted to store state information used for processing messages; and an engine node connected to the cluster network and adapted to read and write the state information to and from the replica when processing the messages; wherein the replica is adapted to detect and report engine node failures in the cluster and the engine node is adapted to detect and report replica failures in the cluster; and wherein the engine node periodically polls the replica for a set of expired timer objects in order to check out the expired timer objects and to process the expired timer objects; and wherein the replica is adapted to notice that the engine node has failed to poll for a specified period of time and is further adapted to notify other engine nodes that the engine node has failed upon expiration of the period of time. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
6. A system for providing failover and fault tolerance, comprising:
-
a replica connected to a cluster network and adapted to store state information used for processing messages; and an engine node connected to the cluster network and adapted to read and write the state information to and from the replica when processing the messages; wherein the replica is adapted to detect and report engine node failures in the cluster and the engine node is adapted to detect and report replica failures in the cluster; and wherein the engine node periodically polls the replica for a set of expired timer objects in order to check out the expired timer objects and to process the expired timer objects; and wherein the engine node periodically polls the replica for a set of expired timer objects in order to check out the expired timer objects and to process the expired timer objects; and wherein the replica is adapted to check in the set of timer objects upon expiration of the specified period of time such that the timer objects could be checked out by another engine node for processing.
-
Specification