Distributed database management system with node failure detection
First Claim
1. In a distributed database management processing system including a plurality of nodes and means at each node for establishing a communications path with every other node in the system and each node includes means for transmitting a ping message and for transmitting a ping acknowledgement message in response to a received ping message, a method for detecting failures of individual nodes in the system comprising the steps of:
- A) designating one node as a leader node for analyzing information,B) at each node as an I-node node, transmitting a ping message to each other node as a receiving node, monitoring at the I-node the corresponding communications path for a valid response from each receiving node, and responding to an invalid response by designating the corresponding receiving node as a suspicious node,C) generating a message for transmittal to the leader node with an identification of the I-node and the suspicious node,D) responding to the message in the leader node by identifying other instances for which communications problems have been recorded with the identified suspicious node, determining a number of I-nodes included in the other instances, and where fewer than a majority of the I-nodes are included in other instances, sending an acknowledgement message to all the I-nodes, andE) selectively designating suspicious nodes as failed where the majority of the I-nodes identify the suspicious nodes in a generated message or the majority of the I-nodes identify the suspicious nodes in a response to the acknowledgment message.
1 Assignment
0 Petitions
Accused Products
Abstract
A node failure detector for use in a distributed database that is accessed through a plurality of interconnected transactional and archival nodes. Each node is selected as an informer node that tests communications with each other node. Each informer node generates a list of suspicious nodes that is resident in one node designated as a leader node. The leader node analyzes the data from all of the informer nodes to designate each node that should be designated for removal with appropriate failover procedures.
80 Citations
16 Claims
-
1. In a distributed database management processing system including a plurality of nodes and means at each node for establishing a communications path with every other node in the system and each node includes means for transmitting a ping message and for transmitting a ping acknowledgement message in response to a received ping message, a method for detecting failures of individual nodes in the system comprising the steps of:
-
A) designating one node as a leader node for analyzing information, B) at each node as an I-node node, transmitting a ping message to each other node as a receiving node, monitoring at the I-node the corresponding communications path for a valid response from each receiving node, and responding to an invalid response by designating the corresponding receiving node as a suspicious node, C) generating a message for transmittal to the leader node with an identification of the I-node and the suspicious node, D) responding to the message in the leader node by identifying other instances for which communications problems have been recorded with the identified suspicious node, determining a number of I-nodes included in the other instances, and where fewer than a majority of the I-nodes are included in other instances, sending an acknowledgement message to all the I-nodes, and E) selectively designating suspicious nodes as failed where the majority of the I-nodes identify the suspicious nodes in a generated message or the majority of the I-nodes identify the suspicious nodes in a response to the acknowledgment message. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. In a distributed database management processing system including a plurality of nodes and means at each node for establishing a communications path with every other node in the system and each node includes means for transmitting a ping message and for transmitting a ping acknowledgement message in response to a received ping message, means for detecting failures of individual nodes in the system comprising:
-
A) means for designating one node as a leader node for analyzing information, B) at each node as an I-node node, means for transmitting a ping message to each other node as a receiving node, means for monitoring at the I-node the corresponding communications path for a valid response from each receiving node, and means for responding to an invalid response by designating the corresponding receiving node as a suspicious node, C) means for generating a message for transmittal to the leader node with an identification of the I-node and the suspicious node, D) means for responding to the message in the leader node by identifying other instances for which communications problems have been recorded with the identified suspicious node, means for determining a number of I-nodes included in the other instances, and means for sending an acknowledgement message to all the I-nodes where fewer than a majority of the I-nodes are included in other instances, and E) means for selectively designating suspicious nodes as failed where the majority of the I-nodes identify the suspicious nodes in a generated message or the majority of the I-nodes identify the suspicious nodes in a response to the acknowledgment message. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A distributed database management processing system comprising:
-
a plurality of nodes comprising a leader node and one or more I-nodes, each node of the plurality of nodes comprising a network interface coupled to a database system network and a processing system coupled to the network interface, each I-node of the one or more I-nodes being configured to; transmit a ping message to each other node of the plurality of nodes; monitor the database system network for a valid response from each other node of the plurality of nodes; respond to an invalid response from a responding node by designating the responding node as a suspicious node; and generate a message for transmittal to the leader node, the message including an identification of the I-node and the suspicious node; the leader node being configured to; respond to the message by identifying other invalid responses from the suspicious node, determining a number of the one or more I-nodes that received invalid responses from the suspicious node, and where fewer than a majority of the one or more I-nodes received invalid responses from the suspicious node, sending an acknowledgement message to all of the one or more I-nodes; and selectively designate suspicious nodes as failed where the majority of the one or more I-nodes identify the suspicious nodes in a generated message or the majority of the one or more I-nodes identify the suspicious nodes in a response to the acknowledgment message. - View Dependent Claims (14)
-
-
15. A distributed database management processing system comprising:
-
a plurality of nodes comprising a leader node and one or more I-nodes, each node of the plurality of nodes comprising a network interface coupled to a database system network and a processing system coupled to the network interface, each I-node of the one or more I-nodes being configured to; transmit a ping message to each other node of the plurality of nodes; monitor the database system network for a valid response from each other node of the plurality of nodes; respond to an invalid response from a responding node by designating the responding node as a suspicious node; and generate a message for transmittal to the leader node, the message including an identification of the I-node and the suspicious node; the leader node being configured to; respond to the message by identifying that the suspicious node responded with an invalid response to only the I-node that generated the message; selectively designate the suspicious node as failed where the suspicious node has a higher node identification than the I-node that generated the message; and selectively designate the I-node that generated the message as failed where the I-node that generated the message has a higher node identification than the suspicious node. - View Dependent Claims (16)
-
Specification