SIP server architecture fault tolerance and failover

US 20080155310A1
Filed: 10/10/2006
Published: 06/26/2008
Est. Priority Date: 10/10/2006
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method for providing failover and fault tolerance, comprising:

maintaining a first replica in a partition for storing state information;

maintaining an engine node that writes and reads state information to and from the first replica and uses the state information to process messages;

detecting that the first replica has failed by the engine node;

reporting the failure to a second replica in the partition; and

updating a view of the partition by the second replica in order to reflect the failure of the first replica.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The SIP server can be comprised of an engine tier and a state tier distributed on a cluster network. Engine nodes in the engine tier can process SIP messages and can read/write state information from/to the state tier. State tier can maintain state information in a set of partitions of one or more replicas which contain duplicate information. The engine nodes can be adapted to detect and report replica failures and the replicas can in turn be adapted to detect and report engine node failures. Replicas can detect faults with an engine node if the engine node fails to poll the replica for a specified period of time and can then report the failure. The engine node can detect failures of a replica when reading or writing state information and can report the failure to another replica, which can be responsible for updating the partition view to exclude dead replicas.

130 Citations

View as Search Results

20 Claims

1. A computer implemented method for providing failover and fault tolerance, comprising:
- maintaining a first replica in a partition for storing state information;
  
  maintaining an engine node that writes and reads state information to and from the first replica and uses the state information to process messages;
  
  detecting that the first replica has failed by the engine node;
  
  reporting the failure to a second replica in the partition; and
  
  updating a view of the partition by the second replica in order to reflect the failure of the first replica.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein said updating of the view further includes proposing a new partition view by the second replica wherein the new partition view excludes the first replica.
  - 3. The method of claim 1, further comprising:
    - retrieving the state information by the engine node from the second replica.
  - 4. The method of claim 1, further comprising:
    - informing other engine nodes about an update in the view of the partition such that other engine nodes cease reading and writing state information from the first replica.
  - 5. The method of claim 1, further comprising:
    - detecting a change in view by the engine node during said reading or writing of the state information to the replica; and
      
      discontinuing the reading or writing of state information by the engine node.
  - 6. The method of claim 5, further comprising:
    - waiting a period of time by the engine node; and
      
      retrying the reading or writing of the state information by the engine node with a new view as updated by the change.
  - 7. The method of claim 6 wherein during a writing of state information, the engine node determines if a new replica has joined the partition and writes the state information to all replicas in the partition including the new replica so as to ensure consistent state.

8. A computer implemented method for providing failover and fault tolerance, comprising:
- maintaining a replica for storing state information;
  
  maintaining an engine node that writes and reads state information to and from the replica and uses the state information to process messages;
  
  periodically polling the replica by the engine node;
  
  failing to poll by the engine node for a specified period of time; and
  
  determining that the engine node has failed by the replica upon expiration of the specified period of time.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method of claim 8 further comprising:
    - notifying other engine nodes by the replica that the engine node has failed.
  - 10. The method of claim 8 wherein the engine node periodically polls the replica for a set of expired timer objects that need to be processed by the engine node.
  - 11. The method of claim 10 wherein the engine node checks out the set of expired timer objects in order to process them.
  - 12. The method of claim 11 wherein upon failing to poll by the engine node, the replica checks in the set of expired timer objects such that they could be reassigned to be processed by another engine node.
  - 13. The method of claim 8 wherein reading and writing state information by the engine node further includes:
    - locking and reading the state information from the replica;
      
      processing the messages; and
      
      writing the state information to the replica and unlocking it.
  - 14. The method of claim 13 wherein the replica can unlock the locked state information upon determining that the engine node has failed.

15. A system for providing failover and fault tolerance, comprising:
- a replica connected to a cluster network and adapted to store state information used for processing messages; and
  
  an engine node connected to the cluster network and adapted to read and write the state information to and from the first replica when processing the messages;
  
  wherein the replica is adapted to detect and report engine node failures in the cluster and the engine node is adapted to detect and report replica failures in the cluster.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The system of claim 15, further comprising:
    - a second replica for storing duplicate state information as the replica wherein the engine node is adapted to detect a failure of the replica and inform the second replica of the failure.
  - 17. The system of claim 16 wherein upon the failure detection of the replica, the second replica updates a view of the replicas such that the replica is excluded from the view.
  - 18. The system of claim 15 wherein the engine node periodically polls the replica for a set of expired timer objects in order to check them out and process them.
  - 19. The system of claim 18 wherein the replica is adapted to notice that the engine node has failed to poll for a specified period of time and is further adapted to notify other engine nodes that the engine node has failed upon expiration of the period of time.
  - 20. The system of claim 18 wherein the replica is adapted to check in the set of timer objects upon expiration of the specified period of time such that they could be checked out by another engine node for processing.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
BEA Systems Incorporated (Oracle Corporation)
Inventors
Langen, Anno R., Kramer, Reto, Beatty, John, Connelly, David, Cheenath, Manoj, Cosmadopoulos, Ioannis, Khan, Rao Nasir

Granted Patent

US 7,661,027 B2
Time in Patent Office

Days
Field of Search
US Class Current

714/6
CPC Class Codes

G06F 11/1425   by reconfiguration of node ...

G06F 11/2028   eliminating a faulty proces...

G06F 11/2094   Redundant storage or storag...

H04L 65/1045   Proxies, e.g. for session i...

H04L 65/1104   Session initiation protocol...

H04L 67/1095   Replication or mirroring of...

H04L 69/40   for recovering from a failu...

SIP server architecture fault tolerance and failover

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

130 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SIP server architecture fault tolerance and failover

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

130 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links