Failover and recovery for replicated data instances
First Claim
1. A system, comprising:
- a plurality of computing nodes, respectively comprising at least one processor and a memory that together implement a control plane for a data store;
the control plane, configured to;
obtain, by a monitoring component, data generation information for each of a primary instance replica and a secondary instance replica, wherein the primary instance replica and the secondary instance replica are located in different data zones;
detect, by the monitoring component, a loss of communication in at least one direction between the monitoring component and the primary instance replica or the secondary instance replica, or in at least one direction between the primary instance replica and the secondary instance replica; and
in response to the detection of the loss of communication in at least one direction, perform a particular type of failover operation or recovery process, wherein the particular type of failover operation or recovery process is determined based at least in part on a nature of the detected loss of communication and the respective data generation information for the primary instance replica and the secondary instance replica.
0 Assignments
0 Petitions
Accused Products
Abstract
Replicated instances in a database environment provide for automatic failover and recovery. A monitoring component can periodically communicate with a primary and a secondary replica for an instance, with each capable of residing in a separate data zone or geographic location to provide a level of reliability and availability. A database running on the primary instance can have information synchronously replicated to the secondary replica at a block level, such that the primary and secondary replicas are in sync. In the event that the monitoring component is not able to communicate with one of the replicas, the monitoring component can attempt to determine whether those replicas can communicate with each other, as well as whether the replicas have the same data generation version. Depending on the state information, the monitoring component can automatically perform a recovery operation, such as to failover to the secondary replica or perform secondary replica recovery.
-
Citations
20 Claims
-
1. A system, comprising:
-
a plurality of computing nodes, respectively comprising at least one processor and a memory that together implement a control plane for a data store; the control plane, configured to; obtain, by a monitoring component, data generation information for each of a primary instance replica and a secondary instance replica, wherein the primary instance replica and the secondary instance replica are located in different data zones; detect, by the monitoring component, a loss of communication in at least one direction between the monitoring component and the primary instance replica or the secondary instance replica, or in at least one direction between the primary instance replica and the secondary instance replica; and in response to the detection of the loss of communication in at least one direction, perform a particular type of failover operation or recovery process, wherein the particular type of failover operation or recovery process is determined based at least in part on a nature of the detected loss of communication and the respective data generation information for the primary instance replica and the secondary instance replica. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method, comprising:
performing, by one or more computers, obtaining, by a monitoring component, data generation information for each of a primary instance replica and a secondary instance replica, wherein the primary instance replica and the secondary instance replica are located in different data zones; detecting, by the monitoring component, a loss of communication in at least one direction between the monitoring component and the primary instance replica or the secondary instance replica, or in at least one direction between the primary instance replica and the secondary instance replica; and in response to detecting the loss of communication in at least one direction, performing a particular type of failover operation or recovery process, wherein the particular type of failover operation or recovery process is determined based at least in part on a nature of the detected loss of communication and the respective data generation information for the primary instance replica and the secondary instance replica. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A non-transitory, computer-readable storage medium, comprising program instructions that when executed by the one or more computing devices cause the one or more computing devices to implement:
-
obtaining, by a monitoring component, data generation information for each of a primary instance replica and a secondary instance replica, wherein the primary instance replica and the secondary instance replica are located in different data zones; detecting, by the monitoring component, a loss of communication in at least one direction between the monitoring component and the primary instance replica or the secondary instance replica, or in at least one direction between the primary instance replica and the secondary instance replica; and in response to detecting the loss of communication in at least one direction, performing a particular type of failover operation or recovery process, wherein the particular type of failover operation or recovery process is determined based at least in part on a nature of the detected loss of communication and the respective data generation information for the primary instance replica and the secondary instance replica. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification