System for live-migration and automated recovery of applications in a distributed system
First Claim
1. A method of managing a plurality of applications hosted by a cluster of servers which each have an interface connectable to at least one client by a network, each application delivering a service at the client, the method comprising:
- electing a server of the cluster as a master server, the master server hosting at least one live application; and
while the master server is hosting the live application, replicating changes in application data of the live application to a configurable number of servers in the cluster elected as slave servers whereby each elected slave server maintains a version of the application data of the live application, wherein responsive to an event in the cluster hosting of the application is transferred from the master server to one of the elected slave servers determined without intervention by a user when the event is detected, the elected slave server using its version of the current application data, to mount the application and become a new master server, and wherein said event is at least one of;
detection of a preferred alternate master server in the cluster based on the loads of servers in the cluster;
detection of a preferred alternate master server based on the locality of servers in the cluster;
detection of a preferred alternate master server in the cluster based on a predefined user preference;
detected by exchanging messages with other servers of the cluster;
addition of a server to the cluster;
removal of a server from the cluster, wherein the removal was anticipated;
failure of a server in the cluster, wherein the data for live applications hosted by the failed server is recovered using versions of the current application on servers in the cluster which are continuing to operate; and
a partition of the cluster, wherein after recovery from the partition a preferred alternate master server is selected from a number of potentially competing master servers as the server with the version of the current application data which is more valuable.
5 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for distribution of applications amongst a number of servers, ensuring that changes to application data on a master for that application are asynchronously replicated to a number of slaves for that application. Servers may be located in geographically diverse locations; the invention permits data replication over high-latency and lossy network connections and failure-tolerance under hardware and network failure conditions. Access to applications is mediated by a distributed protocol handler which allows any request for any application to be addressed to any server, and which, when working in tandem with the replication system, pauses connections momentarily to allow seamless, consistent live-migration of applications and their state between servers. Additionally, a system which controls the aforementioned live-migration based on dynamic measurement of load generated by each application and the topological preferences of each application, in order to automatically keep servers at an optimum utilization level.
56 Citations
11 Claims
-
1. A method of managing a plurality of applications hosted by a cluster of servers which each have an interface connectable to at least one client by a network, each application delivering a service at the client, the method comprising:
-
electing a server of the cluster as a master server, the master server hosting at least one live application; and while the master server is hosting the live application, replicating changes in application data of the live application to a configurable number of servers in the cluster elected as slave servers whereby each elected slave server maintains a version of the application data of the live application, wherein responsive to an event in the cluster hosting of the application is transferred from the master server to one of the elected slave servers determined without intervention by a user when the event is detected, the elected slave server using its version of the current application data, to mount the application and become a new master server, and wherein said event is at least one of; detection of a preferred alternate master server in the cluster based on the loads of servers in the cluster; detection of a preferred alternate master server based on the locality of servers in the cluster; detection of a preferred alternate master server in the cluster based on a predefined user preference; detected by exchanging messages with other servers of the cluster; addition of a server to the cluster; removal of a server from the cluster, wherein the removal was anticipated; failure of a server in the cluster, wherein the data for live applications hosted by the failed server is recovered using versions of the current application on servers in the cluster which are continuing to operate; and a partition of the cluster, wherein after recovery from the partition a preferred alternate master server is selected from a number of potentially competing master servers as the server with the version of the current application data which is more valuable. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. Computer software which, when executed by appropriate processing means, causes the processing means to implement a system for replicating a filesystem between a first server and a second server prior to and following a partition between the first server and the second server, the system comprising:
-
snapshotting means for taking snapshots of a current state of the filesystem on the first server at predetermined points in time following modification of the filesystem, each snapshot recording differences between the current state of the filesystem on the server and a state of the filesystem on the server at the time point of a previous snapshot; replicator means for continually replicating the snapshots taken on the first server to the second server as soon as they are taken; detection means configured such that upon detection of a partition, both the first and the second server become masters for the filesystem and accept new modifications to the filesystems; and updating means configured to perform an update process to update the filesystem after recovery of the partition, the update process comprising; identifying which of the first server and the second server contains the most current version of the filesystem; nominating the server so identified as containing the most current version of the file system to be the master server and the other server as the slave server; identifying a snapshot that is common to both the master server and the slave server; and replicating subsequent snapshots from the master server to the slave server. - View Dependent Claims (8, 9, 10, 11)
-
Specification