Scalable, highly available cluster membership architecture
First Claim
1. A system connecting multiple hosts in a computing network by at least one heartbeat link to each host comprising:
- a) a first set of heartbeat links between said multiple hosts such that each host is connected by the heartbeat link to a first adjoining neighbor host and a second adjoining neighbor host; and
b) a second set of heartbeat links between said multiple hosts such that each host is connected by a heartbeat link to the first adjoining neighbor host'"'"'s adjoining neighbor host and to the second adjoining neighbor host'"'"'s adjoining neighbor host; and
c) wherein a failure of one of said multiple hosts comprises a resulting action of;
d) the failed host'"'"'s first adjoining neighbor host remaining connected by a heartbeat link to said failed host'"'"'s second adjoining neighbor host; and
e) the failed host'"'"'s second adjoining neighbor host remaining connected by a heart beat link to said failed host'"'"'s second adjoining neighbor host'"'"'s second adjoining neighbor; and
f) the failed host'"'"'s first adjoining neighbor host, using said remaining connected heartbeat link from said first adjoining neighbor host to failed host'"'"'s second adjoining neighbor host, and said remaining connected heartbeat link from said second adjoining neighbor host to said second adjoining neighbor host'"'"'s adjoining neighbor host, to connect a new heartbeat link to said failed host'"'"'s second adjoining neighbor host'"'"'s adjoining neighbor host; and
g) the failed host'"'"'s first adjoining neighbor host remaining connected by a heartbeat to said failed host'"'"'s first adjoining neighbor host'"'"'s first adjoining neighbor; and
h) the failed host'"'"'s second adjoining neighbor host connecting a new heartbeat link to said failed host'"'"'s first adjoining neighbor host'"'"'s first adjoining neighbor host by using said remaining connected heartbeat link from failed host'"'"'s first adjoining neighbor host to the failed host'"'"'s second adjoining neighbor host and said remaining heartbeat link from the first adjoining neighbor host to the first adjoining neighbor'"'"'s first adjoining neighbor host;
i) thereby establishing a heartbeat linking structure similar to a heartbeat linking structure prior to the failure of the failed host, with an exception that the failed host is no longer in the structure.
0 Assignments
0 Petitions
Accused Products
Abstract
The invention comprises a software-based communications architecture and associated software methods for establishing and maintaining a common membership among a cluster of multiple, cooperating computers (called hosts). The invention incorporates the use of nearest neighbor and overlapping heartbeat connections between clustered computers that are logically organized in a linear or multi-dimensional array. This arrangement of heartbeat connections has two principal advantages. First it keeps the cluster membership highly available after host failures because hosts can quickly detect and recover from another host'"'"'s failure without partitioning the membership. Second, it enables the cluster membership to scale to large numbers (e.g., hundreds) of computers because the computational and message passing overhead per host to maintain the specified heartbeat connections is fixed and the underlying physical network is allowed to scale. This membership architecture is well suited to distributed applications (such as a partitioned database) in which changes to the workload are made and propagated cluster-wide by neighboring hosts for purposes of load-balancing.
16 Citations
4 Claims
-
1. A system connecting multiple hosts in a computing network by at least one heartbeat link to each host comprising:
-
a) a first set of heartbeat links between said multiple hosts such that each host is connected by the heartbeat link to a first adjoining neighbor host and a second adjoining neighbor host; and b) a second set of heartbeat links between said multiple hosts such that each host is connected by a heartbeat link to the first adjoining neighbor host'"'"'s adjoining neighbor host and to the second adjoining neighbor host'"'"'s adjoining neighbor host; and c) wherein a failure of one of said multiple hosts comprises a resulting action of; d) the failed host'"'"'s first adjoining neighbor host remaining connected by a heartbeat link to said failed host'"'"'s second adjoining neighbor host; and e) the failed host'"'"'s second adjoining neighbor host remaining connected by a heart beat link to said failed host'"'"'s second adjoining neighbor host'"'"'s second adjoining neighbor; and f) the failed host'"'"'s first adjoining neighbor host, using said remaining connected heartbeat link from said first adjoining neighbor host to failed host'"'"'s second adjoining neighbor host, and said remaining connected heartbeat link from said second adjoining neighbor host to said second adjoining neighbor host'"'"'s adjoining neighbor host, to connect a new heartbeat link to said failed host'"'"'s second adjoining neighbor host'"'"'s adjoining neighbor host; and g) the failed host'"'"'s first adjoining neighbor host remaining connected by a heartbeat to said failed host'"'"'s first adjoining neighbor host'"'"'s first adjoining neighbor; and h) the failed host'"'"'s second adjoining neighbor host connecting a new heartbeat link to said failed host'"'"'s first adjoining neighbor host'"'"'s first adjoining neighbor host by using said remaining connected heartbeat link from failed host'"'"'s first adjoining neighbor host to the failed host'"'"'s second adjoining neighbor host and said remaining heartbeat link from the first adjoining neighbor host to the first adjoining neighbor'"'"'s first adjoining neighbor host; i) thereby establishing a heartbeat linking structure similar to a heartbeat linking structure prior to the failure of the failed host, with an exception that the failed host is no longer in the structure. - View Dependent Claims (2, 3)
-
-
4. A method for connecting heartbeat links between multiple hosts in a networked cluster of hosts comprising:
-
a. connecting a first set of heartbeat links between said multiple hosts such that each host is connected by said heartbeat link to said host'"'"'s adjoining neighbor host; and b. connecting a second set of heartbeat links between said multiple hosts such that each host is connected to said host'"'"'s adjoining host'"'"'s adjoining host; and c. whereupon the failure of a single host in said networked cluster of hosts leaves a set of surviving hosts and surviving heartbeat links, the surviving hosts using the surviving heartbeat links to re-establish a set of heartbeat links to each adjoining host and to re-establish a set of heartbeat links to each adjoining host'"'"'s adjoining host; d. thereby returning the networked cluster of hosts to said networked cluster of host'"'"'s original state of heartbeat links minus the failed host.
-
Specification