Clustering Infrastructure System and Method

US 20090177914A1
Filed: 02/26/2009
Published: 07/09/2009
Est. Priority Date: 02/22/2002
Status: Abandoned Application

First Claim

Patent Images

1. A system for ensuring a distributed application providing a service survives an arbitrary node failure comprising:

a cluster of nodes connected via a network;

an application programming interface running on each node of the cluster, the application programming interface configured to run a distributed application with state wherein each of the nodes maintains the state of the distributed application in a shared application programming interface dataspace so that in the event of a failure of a first node in the cluster running the distributed application a second node in the cluster can recover the distributed application state and continue a service provided by the distributed application.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for configuring a cluster of computer nodes to save and restore state in the cluster in the event of node failures. The system and method are implemented through an application programming interface that includes a membership application, a locks application and a dataspace application. The membership application maintains a set of nodes in the cluster. The lock application provides a means for service applications running on the nodes to synchronize access to dataspaces. The dataspaces provide a cluster-wide shared regions in the memory of the cluster members. The API is configured to monitor the cluster members and to coordinate reallocation of a service application if a node running the service application fails.

55 Citations

View as Search Results

20 Claims

1. A system for ensuring a distributed application providing a service survives an arbitrary node failure comprising:
- a cluster of nodes connected via a network;
  
  an application programming interface running on each node of the cluster, the application programming interface configured to run a distributed application with state wherein each of the nodes maintains the state of the distributed application in a shared application programming interface dataspace so that in the event of a failure of a first node in the cluster running the distributed application a second node in the cluster can recover the distributed application state and continue a service provided by the distributed application.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1 wherein the application programming interface directs the second node to reclaim the state of the distributed application upon the failure of the first node.
  - 3. The system of claim 1 wherein the application programming interface comprise a membership component, a lock component and a dataspace component, wherein the application programming interface is configured to utilize the lock component to synchronously update data to the dataspace.
  - 4. The system of claim 3 wherein the membership component of the application programming interface is responsible for detecting a node failure in the cluster of nodes.
  - 5. The system of claim 1 wherein the application programming interface is configured to send a notification upon a failure of a node in the cluster of nodes.
  - 6. The system of claim 1 further comprising a lock server process on each node of the cluster of nodes.
  - 7. The system of claim 6 wherein the lock server process maintains a lock state to enable the lock to be rebuilt upon failure of a node.
  - 8. The system of claim 1 wherein the dataspaces are implemented in a three phase commit protocol.
  - 9. The system of claim 1 further comprising a dataspace server on each node.

10. A system for implementing an application providing a service distributed over a plurality of nodes of a cluster and recovering the service upon an arbitrary failure of one of the plurality of nodes comprising:
- a cluster of nodes connected via a network;
  
  an application with state providing a service distributed over a plurality of nodes in the cluster of nodes, the application being configurable into a plurality of work items;
  
  an application programming interface running on the cluster of nodes, the application programming interface configured to maintain the state of the distributed application in a shared dataspace on each node in the cluster of nodes and to synchronously update data to the dataspace, wherein the dataspace includes a work queue for work items to be processed, a node status array, and a results queue.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 11. The system of claim 10 wherein the cluster of nodes includes a first group of nodes configured to take the work items and place them in the work queue.
  - 12. The system of claim 11 wherein the cluster of nodes includes a second group of nodes configured to process the work items in the work queue.
  - 13. The system of claim 12 wherein the cluster of nodes includes a third group of nodes designated to handle recovery operations in the event of a failure of a node in the second group of nodes.
  - 14. The system of claim 10 wherein the application programming interface includes a work generator for generating work items for the distributed application and placing the work items on the work queue.
  - 15. The system of claim 14 wherein the application programming interface includes a work performer for removing a work item from the work queue and processing the removed work item.
  - 16. The system of claim 15 wherein the work performer sets an element in the array upon removal of the work item.
  - 17. The system of claim 15 wherein the work performer places the work item in the results queue when the work item is processed.
  - 18. The system of claim 15 wherein the application programming interface includes a work rescuer configured to monitor nodes of the cluster processing a work item for a failure, and in the event of a failure placing the work item back on the work queue.
  - 19. The system of claim 18 wherein the work rescuer monitors membership events of the nodes in the cluster of nodes to determine if a node in the cluster of nodes fails.
  - 20. The system of claim 10 wherein the application programming interface is configured to add new nodes to the cluster of nodes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Richard A. Angell
Original Assignee
Richard A. Angell
Inventors
Winchell, David F.

Application Number

US12/393,171
Publication Number

US 20090177914A1
Time in Patent Office

Days
Field of Search
US Class Current

714/4
CPC Class Codes

G06F 11/1425   by reconfiguration of node ...

G06F 11/1492   by run-time replication per...

G06F 11/203   using migration

Clustering Infrastructure System and Method

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

55 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Clustering Infrastructure System and Method

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

55 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links