Method and device for processing failure in at least one distributed cluster, and system
First Claim
1. A method for processing a failure in at least one distributed cluster, comprising:
- receiving, by a first secondary node, a first heartbeat message from a first reference node, wherein the first heartbeat message comprises first indication information indicating that the first reference node is disconnected from a first Master node;
determining, by the first secondary node according to the first indication information, that the first reference node is disconnected from the first Master node;
detecting, by the first secondary node, whether a second heartbeat message from the first Master node to the first secondary node is interrupted;
determining, by the first secondary node, that the first secondary node is also disconnected from the first Master node when the second heartbeat message from the first Master node to the first secondary node is interrupted;
determining, by the first secondary node, that the first Master node disconnected from both the first secondary node and the first reference node is faulty, wherein the at least one distributed cluster comprises a first distributed cluster, and wherein the first distributed cluster comprises the first Master node, a first Slave node, the first reference node, and the first secondary node that serves as a backup of the first Master node; and
sending, by the first secondary node, a broadcast message to all nodes in the first distributed cluster in response to the first Master node being faulty, wherein the broadcast message indicates that the first secondary node is upgraded to a new first Master node.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and a device for processing a failure in at least one distributed cluster, and a system, where the at least one distributed cluster includes a first distributed cluster. The first distributed cluster includes a first Master node, a first Slave node, a first reference node, and a first secondary node that serves as a backup of the first Master node. The first secondary node receives a heartbeat message that includes first indication information. The first secondary node determines, according to the first indication information, that the first reference node is disconnected from the first Master node. The first secondary node determines that the first secondary node is also disconnected from the first Master node when it is detected that a heartbeat message from the first Master node to the first secondary node is interrupted. The first secondary node determines the first Master node is faulty.
62 Citations
16 Claims
-
1. A method for processing a failure in at least one distributed cluster, comprising:
-
receiving, by a first secondary node, a first heartbeat message from a first reference node, wherein the first heartbeat message comprises first indication information indicating that the first reference node is disconnected from a first Master node; determining, by the first secondary node according to the first indication information, that the first reference node is disconnected from the first Master node; detecting, by the first secondary node, whether a second heartbeat message from the first Master node to the first secondary node is interrupted; determining, by the first secondary node, that the first secondary node is also disconnected from the first Master node when the second heartbeat message from the first Master node to the first secondary node is interrupted; determining, by the first secondary node, that the first Master node disconnected from both the first secondary node and the first reference node is faulty, wherein the at least one distributed cluster comprises a first distributed cluster, and wherein the first distributed cluster comprises the first Master node, a first Slave node, the first reference node, and the first secondary node that serves as a backup of the first Master node; and sending, by the first secondary node, a broadcast message to all nodes in the first distributed cluster in response to the first Master node being faulty, wherein the broadcast message indicates that the first secondary node is upgraded to a new first Master node. - View Dependent Claims (2, 3, 4)
-
-
5. A method for processing a failure in at least one distributed cluster, comprising:
-
receiving, by a first Master node, a first heartbeat message from a first secondary node, wherein the first heartbeat message comprises third indication information indicating that the first secondary node is disconnected from a first Slave node; determining, by the first Master node according to the third indication information, that the first secondary node is disconnected from the first Slave node; detecting, by the first Master node, whether a second heartbeat message from the first Slave node to the first Master node is interrupted based on whether the second heartbeat message from the first Slave node is received between a third moment and a fourth moment, wherein the third moment is a moment at which the first Master node receives the first heartbeat message from the first secondary node comprising the third indication information, wherein the fourth moment is earlier than the third moment, wherein a time interval between the third moment and the fourth moment is N times a heartbeat period of sending the second heartbeat message by the first Slave node to the first Master node, and wherein N is a positive integer; determining, by the first Master node, that the first Master node is also disconnected from the first Slave node when it is detected that the second heartbeat message from the first Slave node to the first Master node is interrupted; and determining, by the first Master node, that the first Slave node disconnected from both the first Master node and the first secondary node is faulty, wherein the at least one distributed cluster comprises a first distributed cluster, and wherein the first distributed cluster comprises the first Master node, the first Slave node, a first reference node, and the first secondary node that serves as a backup of the first Master node. - View Dependent Claims (6, 7, 8)
-
-
9. A device for processing a failure in at least one distributed cluster, comprising:
-
a first receiver configured to receive a first heartbeat message comprising a first indication information from a first reference node; a processor coupled to the first receiver and configured to; determine, according to the first indication information received by the first receiver, that the first reference node is disconnected from a first Master node; detect whether a second heartbeat message from the first Master node to the device is interrupted; determine that the device is also disconnected from the first Master node when the heartbeat message from the first Master node to the device is interrupted; and determine, that the first Master node is faulty, wherein the at least one distributed cluster comprises a first distributed cluster, wherein the first distributed cluster comprises the first Master node, a first Slave node, the first reference node, and a first secondary node that serves as a backup of the first Master node, and wherein the device is the first secondary node; and a transmitter coupled to the processor and the first receiver and configured to send a broadcast message to all nodes in the first distributed cluster in response to the processor determining that the first Master node is faulty, wherein the broadcast message indicates that the device is upgraded to a new first Master node. - View Dependent Claims (10, 11, 12)
-
-
13. A device for processing a failure in at least one distributed cluster, comprising:
-
a first receiver configured to receive a first heartbeat message from a first secondary node comprising third indication information; and a processor coupled to the first receiver and configured to; determine, according to the third indication information received by the first receiver, that the first secondary node is disconnected from a first Slave node; detect whether a second heartbeat message from the first Slave node to the device is interrupted based on whether the second heartbeat message from the first Slave node is received between a third moment and a fourth moment, wherein the third moment is a moment at which the device receives the heartbeat message from the first secondary node comprising the third indication information, wherein the fourth moment is earlier than the third moment, wherein a time interval between the third moment and the fourth moment is N times a heartbeat period of sending the second heartbeat message by the first Slave node to the device, and wherein N is a positive integer; determine that the device is also disconnected from the first Slave node when the second heartbeat message from the first Slave node to the device is interrupted; and determine, that the first Slave node is faulty, wherein the at least one distributed cluster comprises a first distributed cluster, wherein the first distributed cluster comprises a first Master node, the first Slave node, a first reference node, and the first secondary node that serves as a backup of the first Master node, and wherein the device is the first Master node. - View Dependent Claims (14, 15, 16)
-
Specification