Clustering Infrastructure System and Method
First Claim
1. A system for ensuring a distributed application providing a service survives an arbitrary node failure comprising:
- a cluster of nodes connected via a network;
an application programming interface running on each node of the cluster, the application programming interface configured to run a distributed application with state wherein each of the nodes maintains the state of the distributed application in a shared application programming interface dataspace so that in the event of a failure of a first node in the cluster running the distributed application a second node in the cluster can recover the distributed application state and continue a service provided by the distributed application.
0 Assignments
0 Petitions
Accused Products
Abstract
A system and method for configuring a cluster of computer nodes to save and restore state in the cluster in the event of node failures. The system and method are implemented through an application programming interface that includes a membership application, a locks application and a dataspace application. The membership application maintains a set of nodes in the cluster. The lock application provides a means for service applications running on the nodes to synchronize access to dataspaces. The dataspaces provide a cluster-wide shared regions in the memory of the cluster members. The API is configured to monitor the cluster members and to coordinate reallocation of a service application if a node running the service application fails.
55 Citations
20 Claims
-
1. A system for ensuring a distributed application providing a service survives an arbitrary node failure comprising:
-
a cluster of nodes connected via a network; an application programming interface running on each node of the cluster, the application programming interface configured to run a distributed application with state wherein each of the nodes maintains the state of the distributed application in a shared application programming interface dataspace so that in the event of a failure of a first node in the cluster running the distributed application a second node in the cluster can recover the distributed application state and continue a service provided by the distributed application. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for implementing an application providing a service distributed over a plurality of nodes of a cluster and recovering the service upon an arbitrary failure of one of the plurality of nodes comprising:
-
a cluster of nodes connected via a network; an application with state providing a service distributed over a plurality of nodes in the cluster of nodes, the application being configurable into a plurality of work items; an application programming interface running on the cluster of nodes, the application programming interface configured to maintain the state of the distributed application in a shared dataspace on each node in the cluster of nodes and to synchronously update data to the dataspace, wherein the dataspace includes a work queue for work items to be processed, a node status array, and a results queue. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification