Failover and recovery for replicated data instances
First Claim
1. A computer-implemented method of managing recovery of a replicated instance for a relational database instance from a control environment, comprising:
- under control of one or more computer systems configured with executable instructions,periodically communicating with a primary instance replica and a secondary instance replica in a database environment using a monitoring component of a separate control environment, each response received by the at least one monitoring component including status information and data generation information for a respective one of the first and second instance replicas, data updates for the primary instance replica being synchronously replicated to the secondary instance replica for a single data generation;
in response to the at least one monitoring component being unable to communicate with one of the first and second instance replicas, determining whether the first and second instance replicas are able to communicate with each other and whether the first and second instance replicas have common data generation information;
when the monitoring component is unable to communicate with the primary replica for a minimum period of time, the secondary instance replica is unable to communicate with the primary replica, and the second instance replica has the same data generation information as a last known state of the primary replica, causing the secondary instance replica to perform a failover operation to become a new primary replica for the relational database instance;
when the monitoring component is unable to communicate with the secondary replica for a minimum period of time, and the primary instance replica is unable to communicate with the secondary replica, causing a secondary instance replica recovery process to be executed that generates a new secondary instance replica for the relational database instance; and
when the monitoring component is unable to communicate with either of the primary replica and the secondary replica for a minimum period of time, the primary and secondary instance replicas are able to communicate with each other, and the primary and secondary instance replicas have the same data generation information, no failover or recovery operation is performed for the primary and secondary instance replicas.
1 Assignment
0 Petitions
Accused Products
Abstract
Replicated instances in a database environment provide for automatic failover and recovery. A monitoring component can periodically communicate with a primary and a secondary replica for an instance, with each capable of residing in a separate data zone or geographic location to provide a level of reliability and availability. A database running on the primary instance can have information synchronously replicated to the secondary replica at a block level, such that the primary and secondary replicas are in sync. In the event that the monitoring component is not able to communicate with one of the replicas, the monitoring component can attempt to determine whether those replicas can communicate with each other, as well as whether the replicas have the same data generation version. Depending on the state information, the monitoring component can automatically perform a recovery operation, such as to failover to the secondary replica or perform secondary replica recovery.
-
Citations
27 Claims
-
1. A computer-implemented method of managing recovery of a replicated instance for a relational database instance from a control environment, comprising:
under control of one or more computer systems configured with executable instructions, periodically communicating with a primary instance replica and a secondary instance replica in a database environment using a monitoring component of a separate control environment, each response received by the at least one monitoring component including status information and data generation information for a respective one of the first and second instance replicas, data updates for the primary instance replica being synchronously replicated to the secondary instance replica for a single data generation; in response to the at least one monitoring component being unable to communicate with one of the first and second instance replicas, determining whether the first and second instance replicas are able to communicate with each other and whether the first and second instance replicas have common data generation information; when the monitoring component is unable to communicate with the primary replica for a minimum period of time, the secondary instance replica is unable to communicate with the primary replica, and the second instance replica has the same data generation information as a last known state of the primary replica, causing the secondary instance replica to perform a failover operation to become a new primary replica for the relational database instance; when the monitoring component is unable to communicate with the secondary replica for a minimum period of time, and the primary instance replica is unable to communicate with the secondary replica, causing a secondary instance replica recovery process to be executed that generates a new secondary instance replica for the relational database instance; and when the monitoring component is unable to communicate with either of the primary replica and the secondary replica for a minimum period of time, the primary and secondary instance replicas are able to communicate with each other, and the primary and secondary instance replicas have the same data generation information, no failover or recovery operation is performed for the primary and secondary instance replicas. - View Dependent Claims (2, 3, 4)
-
5. A computer-implemented method of managing a replicated database instance in a database environment using a separate control environment, comprising:
under control of one or more computer systems configured with executable instructions, monitoring state information for each of a primary instance replica and a secondary instance replica in a database environment using a monitoring component of a separate control environment; and in response to the monitoring component being unable to communicate with one of the first and second instance replicas; determining failure information including whether the first and second instance replicas are able to communicate with each other and whether the first and second instance replicas have a common data generation identifier; based at least in part upon the failure information, determining a workflow to be executed in the control environment, the workflow including one or more tasks to be executed in the database environment in response to the monitoring component being unable to communicate with one of the first and second instance replicas; and executing the workflow in the control environment. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
17. A system for managing a replicated database instance in a database environment using a separate control environment, comprising:
-
a processor; and a memory device including instructions that, when executed by the processor, cause the processor to; monitor state information for each of a primary instance replica and a secondary instance replica in a database environment using at least one monitoring component of a separate control environment; and in response to the at least one monitoring component being unable to communicate with one of the first and second instance replicas; determine failure information including whether the first and second instance replicas are able to communicate with each other and whether the first and second instance replicas have a common data generation identifier; based at least in part upon the failure information, determine a workflow to be executed in the control environment, the workflow including one or more tasks to be executed in the database environment in response to the monitoring component being unable to communicate with one of the first and second instance replicas; and execute the workflow in the control environment. - View Dependent Claims (18, 19, 20, 21, 22)
-
-
23. A non-transitory computer-readable storage medium storing instructions for managing a replicated database instance in a database environment using a separate control environment, the instructions when executed by a processor causing the processor to:
-
monitor state information for each of a primary instance replica and a secondary instance replica in a database environment using at least one monitoring component of a separate control environment; and in response to the at least one monitoring component being unable to communicate with one of the first and second instance replicas; determine failure information including whether the first and second instance replicas are able to communicate with each other and whether the first and second instance replicas have a common data generation identifier; based at least in part upon the failure information, determine a workflow to be executed in the control environment, the workflow including one or more tasks to be executed in the database environment in response to the monitoring component being unable to communicate with one of the first and second instance replicas; and execute the workflow in the control environment. - View Dependent Claims (24, 25, 26, 27)
-
Specification