SIP server architecture fault tolerance and failover

US 7,661,027 B2
Filed: 10/10/2006
Issued: 02/09/2010
Est. Priority Date: 10/10/2006
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method for providing failover and fault tolerance, comprising:

maintaining a replica for storing state information;

maintaining an engine node that writes and reads state information to and from the replica and uses the state information to process messages;

periodically polling the replica by the engine node;

failing to poll by the engine node for a specified period of time; and

determining that the engine node has failed by the replica upon expiration of the specified period of time;

wherein the engine node periodically polls the replica for a set of expired timer objects that need to be processed by the engine node.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The SIP server can be comprised of an engine tier and a state tier distributed on a cluster network. Engine nodes in the engine tier can process SIP messages and can read/write state information from/to the state tier. State tier can maintain state information in a set of partitions of one or more replicas which contain duplicate information. The engine nodes can be adapted to detect and report replica failures and the replicas can in turn be adapted to detect and report engine node failures. Replicas can detect faults with an engine node if the engine node fails to poll the replica for a specified period of time and can then report the failure. The engine node can detect failures of a replica when reading or writing state information and can report the failure to another replica, which can be responsible for updating the partition view to exclude dead replicas.

Citations

11 Claims

1. A computer implemented method for providing failover and fault tolerance, comprising:
- maintaining a replica for storing state information;
  
  maintaining an engine node that writes and reads state information to and from the replica and uses the state information to process messages;
  
  periodically polling the replica by the engine node;
  
  failing to poll by the engine node for a specified period of time; and
  
  determining that the engine node has failed by the replica upon expiration of the specified period of time;
  
  wherein the engine node periodically polls the replica for a set of expired timer objects that need to be processed by the engine node.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1 wherein the engine node checks out the set of expired timer objects in order to process the expired timer objects.
  - 3. The method of claim 2 wherein upon failing to poll by the engine node, the replica checks in the set of expired timer objects such that the expired timer objects could be reassigned to be processed by another engine node.

4. A system for providing failover and fault tolerance, comprising:
- a replica connected to a cluster network and adapted to store state information used for processing messages; and
  
  an engine node connected to the cluster network and adapted to read and write the state information to and from the replica when processing the messages;
  
  wherein the replica is adapted to detect and report engine node failures in the cluster and the engine node is adapted to detect and report replica failures in the cluster; and
  
  wherein the engine node periodically polls the replica for a set of expired timer objects in order to check out the expired timer objects and to process the expired timer objects; and
  
  wherein the engine node periodically polls the replica for a set of expired timer objects in order to check out the expired timer objects and to process the expired timer objects.

5. A system for providing failover and fault tolerance, comprising:
- a replica connected to a cluster network and adapted to store state information used for processing messages; and
  
  an engine node connected to the cluster network and adapted to read and write the state information to and from the replica when processing the messages;
  
  wherein the replica is adapted to detect and report engine node failures in the cluster and the engine node is adapted to detect and report replica failures in the cluster; and
  
  wherein the engine node periodically polls the replica for a set of expired timer objects in order to check out the expired timer objects and to process the expired timer objects; and
  
  wherein the replica is adapted to notice that the engine node has failed to poll for a specified period of time and is further adapted to notify other engine nodes that the engine node has failed upon expiration of the period of time.
- View Dependent Claims (7, 8, 9, 10, 11)
- - 7. The system of claim 5, further comprising:
    - a second replica that stores duplicate state information with respect to the replica, wherein the engine node detects a failure of the replica and informs the second replica of the failure.
  - 8. The system of claim 7, wherein upon detection of the failure of the replica, the second replica updates a view of the replicas such that the replica is excluded from the view.
  - 9. The system of claim 5, wherein reading and writing state information by the engine node further includes:
    - locking and reading the state information from the replica;
      
      processing the messages; and
      
      writing the state information to the replica and unlocking it.
  - 10. The system of claim 9, wherein the replica unlocks a set of locked state information upon determining that the engine node has failed.
  - 11. The system of claim 5, wherein during the reading or writing of the state information to the replica, the engine node detects a change in a view of replicas and wherein upon detecting the change in the view, the engine node discontinues the reading or writing of the state information.

6. A system for providing failover and fault tolerance, comprising:
- a replica connected to a cluster network and adapted to store state information used for processing messages; and
  
  an engine node connected to the cluster network and adapted to read and write the state information to and from the replica when processing the messages;
  
  wherein the replica is adapted to detect and report engine node failures in the cluster and the engine node is adapted to detect and report replica failures in the cluster; and
  
  wherein the engine node periodically polls the replica for a set of expired timer objects in order to check out the expired timer objects and to process the expired timer objects; and
  
  wherein the engine node periodically polls the replica for a set of expired timer objects in order to check out the expired timer objects and to process the expired timer objects; and
  
  wherein the replica is adapted to check in the set of timer objects upon expiration of the specified period of time such that the timer objects could be checked out by another engine node for processing.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
BEA Systems Incorporated (Oracle Corporation)
Inventors
Langen, Anno R., Kramer, Reto, Beatty, John, Connelly, David, Cheenath, Manoj, Cosmadopoulos, Ioannis, Khan, Rao Nasir
Primary Examiner(s)
Iqbal; Nadeem

Application Number

US11/545,648
Publication Number

US 20080155310A1
Time in Patent Office

1,218 Days
Field of Search

714/3, 714/16
US Class Current

714/15
CPC Class Codes

G06F 11/1425   by reconfiguration of node ...

G06F 11/2028   eliminating a faulty proces...

G06F 11/2094   Redundant storage or storag...

H04L 65/1045   Proxies, e.g. for session i...

H04L 65/1104   Session initiation protocol...

H04L 67/1095   Replication or mirroring of...

H04L 69/40   for recovering from a failu...

SIP server architecture fault tolerance and failover

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

SIP server architecture fault tolerance and failover

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links