Method and device for processing failure in at least one distributed cluster, and system

US 10,560,315 B2
Filed: 08/10/2017
Issued: 02/11/2020
Est. Priority Date: 02/10/2015
Status: Active Grant

First Claim

Patent Images

1. A method for processing a failure in at least one distributed cluster, comprising:

receiving, by a first secondary node, a first heartbeat message from a first reference node, wherein the first heartbeat message comprises first indication information indicating that the first reference node is disconnected from a first Master node;

determining, by the first secondary node according to the first indication information, that the first reference node is disconnected from the first Master node;

detecting, by the first secondary node, whether a second heartbeat message from the first Master node to the first secondary node is interrupted;

determining, by the first secondary node, that the first secondary node is also disconnected from the first Master node when the second heartbeat message from the first Master node to the first secondary node is interrupted;

determining, by the first secondary node, that the first Master node disconnected from both the first secondary node and the first reference node is faulty, wherein the at least one distributed cluster comprises a first distributed cluster, and wherein the first distributed cluster comprises the first Master node, a first Slave node, the first reference node, and the first secondary node that serves as a backup of the first Master node; and

sending, by the first secondary node, a broadcast message to all nodes in the first distributed cluster in response to the first Master node being faulty, wherein the broadcast message indicates that the first secondary node is upgraded to a new first Master node.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and a device for processing a failure in at least one distributed cluster, and a system, where the at least one distributed cluster includes a first distributed cluster. The first distributed cluster includes a first Master node, a first Slave node, a first reference node, and a first secondary node that serves as a backup of the first Master node. The first secondary node receives a heartbeat message that includes first indication information. The first secondary node determines, according to the first indication information, that the first reference node is disconnected from the first Master node. The first secondary node determines that the first secondary node is also disconnected from the first Master node when it is detected that a heartbeat message from the first Master node to the first secondary node is interrupted. The first secondary node determines the first Master node is faulty.

62 Citations

View as Search Results

16 Claims

1. A method for processing a failure in at least one distributed cluster, comprising:
- receiving, by a first secondary node, a first heartbeat message from a first reference node, wherein the first heartbeat message comprises first indication information indicating that the first reference node is disconnected from a first Master node;
  
  determining, by the first secondary node according to the first indication information, that the first reference node is disconnected from the first Master node;
  
  detecting, by the first secondary node, whether a second heartbeat message from the first Master node to the first secondary node is interrupted;
  
  determining, by the first secondary node, that the first secondary node is also disconnected from the first Master node when the second heartbeat message from the first Master node to the first secondary node is interrupted;
  
  determining, by the first secondary node, that the first Master node disconnected from both the first secondary node and the first reference node is faulty, wherein the at least one distributed cluster comprises a first distributed cluster, and wherein the first distributed cluster comprises the first Master node, a first Slave node, the first reference node, and the first secondary node that serves as a backup of the first Master node; and
  
  sending, by the first secondary node, a broadcast message to all nodes in the first distributed cluster in response to the first Master node being faulty, wherein the broadcast message indicates that the first secondary node is upgraded to a new first Master node.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein detecting whether the second heartbeat message from the first Master node to the first secondary node is interrupted comprises detecting, by the first secondary node, whether the second heartbeat message from the first Master node is received between a first moment and a second moment, wherein the first moment is a moment at which the first secondary node receives the first heartbeat message from the first reference node comprising the first indication information, wherein the second moment is earlier than the first moment, wherein a time interval between the first moment and the second moment is N times a heartbeat period of sending the second heartbeat message by the first Master node to the first secondary node, and wherein N is a positive integer.
  - 3. The method of claim 1, wherein the at least one distributed cluster further comprises a second distributed cluster, and wherein the method further comprises:
    - receiving, by the first secondary node, a heartbeat message from a second reference node, wherein the heartbeat message comprises second indication information indicating that a node attribute of a receive end of the heartbeat message is a Slave node, wherein the second reference node is configured to separately send the heartbeat message comprising the second indication information to a second Slave node and all the nodes in the first distributed cluster, and wherein the second distributed cluster and the first distributed cluster have a same cluster identifier;
      
      determining, by the first secondary node according to the second indication information, that the node attribute of the receive end of the heartbeat message indicated in the heartbeat message from the second reference node does not match a node attribute of the first secondary node;
      
      determining, by the first secondary node, that the second distributed cluster in which the second reference node is located and the first distributed cluster in which the first secondary node is located are two sub-clusters formed after one network distributed cluster is split; and
      
      negotiating, by the first secondary node, with a second secondary node in the second distributed cluster to integrate the first distributed cluster and the second distributed cluster, and wherein the second distributed cluster comprises a second Master node, a second Slave node, the second reference node, and the second secondary node that serves as a backup of the second Master node.
  - 4. The method of claim 3, wherein negotiating with the second secondary node in the second distributed cluster comprises:
    - sending, by the first secondary node to all nodes in the first distributed cluster and the second distributed cluster, a broadcast message indicating that the node attribute of the first secondary node is a secondary node;
      
      receiving, by the first secondary node, a negotiation message from the second secondary node, wherein the negotiation message comprises information indicating a weight of the second secondary node, and wherein the negotiation message is sent by the second secondary node to the first secondary node when it is detected that the node attribute indicated in the broadcast message is the same as a node attribute of the second secondary node;
      
      sending, by the first secondary node to the second secondary node, a negotiation response message instructing to downgrade the second secondary node to a Slave node when a weight of the first secondary node is greater than or equal to the weight of the second secondary node; and
      
      sending, by the first secondary node to all the nodes in the first distributed cluster and the second distributed cluster, a broadcast message indicating that the first secondary node is downgraded to a Slave node when the weight of the first secondary node is less than the weight of the second secondary node.

5. A method for processing a failure in at least one distributed cluster, comprising:
- receiving, by a first Master node, a first heartbeat message from a first secondary node, wherein the first heartbeat message comprises third indication information indicating that the first secondary node is disconnected from a first Slave node;
  
  determining, by the first Master node according to the third indication information, that the first secondary node is disconnected from the first Slave node;
  
  detecting, by the first Master node, whether a second heartbeat message from the first Slave node to the first Master node is interrupted based on whether the second heartbeat message from the first Slave node is received between a third moment and a fourth moment, wherein the third moment is a moment at which the first Master node receives the first heartbeat message from the first secondary node comprising the third indication information, wherein the fourth moment is earlier than the third moment, wherein a time interval between the third moment and the fourth moment is N times a heartbeat period of sending the second heartbeat message by the first Slave node to the first Master node, and wherein N is a positive integer;
  
  determining, by the first Master node, that the first Master node is also disconnected from the first Slave node when it is detected that the second heartbeat message from the first Slave node to the first Master node is interrupted; and
  
  determining, by the first Master node, that the first Slave node disconnected from both the first Master node and the first secondary node is faulty, wherein the at least one distributed cluster comprises a first distributed cluster, and wherein the first distributed cluster comprises the first Master node, the first Slave node, a first reference node, and the first secondary node that serves as a backup of the first Master node.
- View Dependent Claims (6, 7, 8)
- - 6. The method of claim 5, further comprising:
    - detecting, by the first Master node, whether the heartbeat message from the first secondary node and a heartbeat message from the first reference node are received within a preset detection period, wherein the preset detection period is M times the heartbeat period of sending a heartbeat message, and wherein M is a positive integer; and
      
      determining, by the first Master node, that both the first secondary node and the first reference node are faulty when neither the heartbeat message from the first secondary node nor the heartbeat message from the first reference node is received within the preset detection period.
  - 7. The method of claim 5, wherein the at least one distributed cluster further comprises a second distributed cluster, and wherein the method further comprises:
    - receiving, by the first Master node, a heartbeat message from a second reference node, wherein the heartbeat message from the second reference node comprises fourth indication information indicating that a node attribute of a receive end of the heartbeat message is a Slave node, wherein the second reference node is configured to separately send the heartbeat message comprising the fourth indication information to a second Slave node and all nodes in the first distributed cluster, and wherein the second distributed cluster and the first distributed cluster have a same cluster identifier;
      
      determining, by the first Master node according to the fourth indication information, that the node attribute of the receive end of the heartbeat message indicated in the heartbeat message from the second reference node does not match a node attribute of the first Master node;
      
      determining, by the first Master node, that the second distributed cluster in which the second reference node is located and the first distributed cluster in which the first Master node is located are two sub-clusters formed after one network distributed cluster is split; and
      
      negotiating, by the first Master node, with a second Master node in the second distributed cluster to integrate the first distributed cluster and the second distributed cluster, and wherein the second distributed cluster comprises the second Master node, the second Slave node, the second reference node, and a second secondary node that serves as a backup of the second Master node.
  - 8. The method of claim 7, wherein negotiating with the second Master node in the second distributed cluster comprises:
    - sending, by the first Master node to all nodes in the first distributed cluster and the second distributed cluster, a broadcast message indicating that the node attribute of the first Master node is a Master node;
      
      receiving, by the first Master node, a negotiation message from the second Master node, wherein the negotiation message comprises information indicating a weight of the second Master node, and wherein the negotiation message is sent by the second Master node to the first Master node when it is detected that the node attribute indicated in the broadcast message is the same as a node attribute of the second Master node;
      
      sending, by the first Master node to the second Master node, a negotiation response message instructing to downgrade the second Master node to a Slave node when a weight of the first Master node is greater than or equal to the weight of the second Master node; and
      
      sending, by the first Master node to all the nodes in the first distributed cluster and the second distributed cluster, a broadcast message indicating that the first Master node is downgraded to a Slave node when the weight of the first Master node is less than the weight of the second Master node.

9. A device for processing a failure in at least one distributed cluster, comprising:
- a first receiver configured to receive a first heartbeat message comprising a first indication information from a first reference node;
  
  a processor coupled to the first receiver and configured to;
  
  determine, according to the first indication information received by the first receiver, that the first reference node is disconnected from a first Master node;
  
  detect whether a second heartbeat message from the first Master node to the device is interrupted;
  
  determine that the device is also disconnected from the first Master node when the heartbeat message from the first Master node to the device is interrupted; and
  
  determine, that the first Master node is faulty, wherein the at least one distributed cluster comprises a first distributed cluster, wherein the first distributed cluster comprises the first Master node, a first Slave node, the first reference node, and a first secondary node that serves as a backup of the first Master node, and wherein the device is the first secondary node; and
  
  a transmitter coupled to the processor and the first receiver and configured to send a broadcast message to all nodes in the first distributed cluster in response to the processor determining that the first Master node is faulty, wherein the broadcast message indicates that the device is upgraded to a new first Master node.
- View Dependent Claims (10, 11, 12)
- - 10. The device of claim 9, wherein the processor is further configured to detect whether the second heartbeat message from the first Master node is received between a first moment and a second moment, wherein the first moment is a moment at which the device receives the first heartbeat message from the first reference node comprising the first indication information, wherein the second moment is earlier than the first moment, wherein a time interval between the first moment and the second moment is N times a heartbeat period of sending a second heartbeat message by the first Master node to the device, and wherein N is a positive integer.
  - 11. The device of claim 9, wherein the at least one distributed cluster further comprises a second distributed cluster, wherein the device further comprises a second receiver coupled to the processor, the first receiver, and the transmitter and configured to receive a heartbeat message from a second reference node, wherein the heartbeat message comprises second indication information indicating that a node attribute of a receive end of the heartbeat message is a Slave node, wherein the second reference node is configured to separately send the heartbeat message comprising the second indication information to a second Slave node and all nodes in the first distributed cluster, wherein the second distributed cluster and the first distributed cluster have a same cluster identifier, and wherein the processor is further configured to:
    - determine, according to the second indication information received by the second receiver, that the node attribute of the receive end of the heartbeat message indicated in the heartbeat message from the second reference node does not match a node attribute of the device;
      
      determine that the second distributed cluster in which the second reference node is located and the first distributed cluster in which the device is located are two sub-clusters formed after one network distributed cluster is split; and
      
      negotiate with a second secondary node in the second distributed cluster to integrate the first distributed cluster and the second distributed cluster, and wherein the second distributed cluster comprises a second Master node, the second Slave node, the second reference node, and the second secondary node that serves as a backup of the second Master node.
  - 12. The device of claim 11, wherein the processor is further configured to:
    - send, to all nodes in the first distributed cluster and the second distributed cluster, a broadcast message indicating that the node attribute of the device is a secondary node;
      
      receive a negotiation message from the second secondary node, wherein the negotiation message comprises information indicating a weight of the second secondary node, and wherein the negotiation message is sent by the second secondary node to the device when it is detected that the node attribute indicated in the broadcast message is the same as a node attribute of the second secondary node;
      
      send, to the second secondary node, a negotiation response message instructing to downgrade the second secondary node to a Slave node when a weight of the device is greater than or equal to the weight of the second secondary node; and
      
      send, to all the nodes in the first distributed cluster and the second distributed cluster, a broadcast message indicating that the device is downgraded to a Slave node when the weight of the device is less than the weight of the second secondary node.

13. A device for processing a failure in at least one distributed cluster, comprising:
- a first receiver configured to receive a first heartbeat message from a first secondary node comprising third indication information; and
  
  a processor coupled to the first receiver and configured to;
  
  determine, according to the third indication information received by the first receiver, that the first secondary node is disconnected from a first Slave node;
  
  detect whether a second heartbeat message from the first Slave node to the device is interrupted based on whether the second heartbeat message from the first Slave node is received between a third moment and a fourth moment, wherein the third moment is a moment at which the device receives the heartbeat message from the first secondary node comprising the third indication information, wherein the fourth moment is earlier than the third moment, wherein a time interval between the third moment and the fourth moment is N times a heartbeat period of sending the second heartbeat message by the first Slave node to the device, and wherein N is a positive integer;
  
  determine that the device is also disconnected from the first Slave node when the second heartbeat message from the first Slave node to the device is interrupted; and
  
  determine, that the first Slave node is faulty, wherein the at least one distributed cluster comprises a first distributed cluster, wherein the first distributed cluster comprises a first Master node, the first Slave node, a first reference node, and the first secondary node that serves as a backup of the first Master node, and wherein the device is the first Master node.
- View Dependent Claims (14, 15, 16)
- - 14. The device of claim 13, wherein the processor is further configured to:
    - detect, within a preset detection period, whether the heartbeat message from the first secondary node and a heartbeat message from the first reference node are received, wherein the preset detection period is M times the heartbeat period of sending a heartbeat message, and wherein M is a positive integer; and
      
      determine that both the first secondary node and the first reference node are faulty when neither the heartbeat message from the first secondary node nor the heartbeat message from the first reference node is received by the processor within the preset detection period.
  - 15. The device of claim 13, wherein the at least one distributed cluster further comprises a presecond distributed cluster, wherein the second distributed cluster comprises a second Slaveary node and a heartbeat message from, a second reference node, are second secondary node that serves as a backup of the second Master node, wherein the device further comprises a second receiver coupled to the first receiver and the processor and configured to receive a heartbeat message from the second reference node, wherein the heartbeat message from the second reference node comprises fourth indication information indicating that a node attribute of a receive end of the heartbeat message is a Slave node, wherein the second reference node is configured to separately send, to the second Slave node and all nodes in the first distributed cluster, the heartbeat message comprising the fourth indication information, wherein the second distributed cluster and the first distributed cluster have a same cluster identifier, and wherein the processor is further configured to:
    - determine, according to the fourth indication information received by the second receiver, that the node attribute of the receive end of the heartbeat message indicated in the heartbeat message from the second reference node does not match a node attribute of the device;
      
      determine that the second distributed cluster in which the second reference node is located and the first distributed cluster in which the device is located are two sub-clusters formed after one network distributed cluster is split; and
      
      negotiate with the second Master node in the second distributed cluster to integrate the first distributed cluster and the second distributed cluster.
  - 16. The device of claim 15, wherein the processor is further configured to:
    - send, to all nodes in the first distributed cluster and the second distributed cluster, a broadcast message indicating that the node attribute of the device is a Master node;
      
      receive a negotiation message from the second Master node, wherein the negotiation message comprises information indicating a weight of the second Master node, and the negotiation message is from the second Master node to the device when it is detected that the node attribute indicated in the broadcast message is the same as a node attribute of the second Master node;
      
      send, to the second Master node, a negotiation response message that is used to instruct to downgrade the second Master node to a Slave node when a weight of the device is greater than or equal to the weight of the second Master node; and
      
      send, to all the nodes in the first distributed cluster and the second distributed cluster, a broadcast message indicating that the device is downgraded to a Slave node when the weight of the device is less than the weight of the second Master node.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Original Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Inventors
Yuan, Jianqing, Ni, Shaoji
Primary Examiner(s)
Nowlin, Eric

Application Number

US15/674,159
Publication Number

US 20170339005A1
Time in Patent Office

915 Days
Field of Search
US Class Current
CPC Class Codes

H04L 41/0668   by dynamic selection of rec...

H04L 41/0677   Localisation of faults

H04L 43/00   Arrangements for monitoring...

H04L 43/0817   by checking functioning

H04L 43/10   Active monitoring, e.g. hea...

H04L 65/40   Support for services or app...

H04L 67/145   avoiding end of session, e....

H04L 69/40   for recovering from a failu...

Method and device for processing failure in at least one distributed cluster, and system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

62 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Method and device for processing failure in at least one distributed cluster, and system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

62 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links