Monitoring and automated recovery of data instances
First Claim
1. A computer-implemented method of recovering from a failure in a data environment, comprising:
- under control of one or more computer systems configured with executable instructions,monitoring a plurality of host managers by a set of event processors in a control environment, the plurality of host managers each being associated with an identifier from a range of consecutively ordered identifiers and responsible for monitoring a status of at least one data instance in a data environment, wherein a respective first substantially equivalent portion of the range of consecutively ordered identifiers is allocated to each event processor of the set of event processors;
causing a heartbeat message to be sent from each event processor in an active state to each other event processor in the set to indicate that the event processor sending the heartbeat message is in the active state;
identifying an event processor of the set being in an inactive state based at least in part upon the heartbeat message not being received from the event processor in the inactive state; and
reallocating the range of identifiers to the event processors in the active state from which heartbeat messages were received, wherein a respective second substantially equivalent portion of the range of identifiers is reallocated to each of the event processors in the active state.
0 Assignments
0 Petitions
Accused Products
Abstract
The monitoring and recovery of data instances, data stores, and other such components in a data environment can be performed automatically using a separate control environment. A monitoring component of the control plane can include a set of event processors for monitoring a workload of the data environment, where an event processor detecting a problem in the data plane can cause a recovery workflow to be generated in order to recover from the detected problem. The event processors can communicate with each other such that if one of the event processors becomes unavailable, the other event processors in a set are able to automatically redistribute responsibility for the workload.
-
Citations
21 Claims
-
1. A computer-implemented method of recovering from a failure in a data environment, comprising:
under control of one or more computer systems configured with executable instructions, monitoring a plurality of host managers by a set of event processors in a control environment, the plurality of host managers each being associated with an identifier from a range of consecutively ordered identifiers and responsible for monitoring a status of at least one data instance in a data environment, wherein a respective first substantially equivalent portion of the range of consecutively ordered identifiers is allocated to each event processor of the set of event processors; causing a heartbeat message to be sent from each event processor in an active state to each other event processor in the set to indicate that the event processor sending the heartbeat message is in the active state; identifying an event processor of the set being in an inactive state based at least in part upon the heartbeat message not being received from the event processor in the inactive state; and reallocating the range of identifiers to the event processors in the active state from which heartbeat messages were received, wherein a respective second substantially equivalent portion of the range of identifiers is reallocated to each of the event processors in the active state. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A system for recovering from a failure in a data environment, comprising:
-
at least one event processor; and memory including instructions that, when executed by the at least one event processor, cause the system to; monitor, by a set of event processors in a control environment, a plurality of host managers each having an identifier over a range of consecutively ordered identifiers and being responsible for monitoring a status of at least one data instance in a data environment, wherein a respective first substantially equivalent portion of the range of consecutively ordered identifiers is allocated to each event processor of the set; cause a heartbeat message to be sent from each event processor in an active state to each other event processor in the set to indicate that the event processor sending the heartbeat message is in the active state; identify an event processor of the set being in an inactive state in response to the heartbeat message not being received from the event processor in the inactive state; and reallocate the range of identifiers to the event processors in the active state from which status messages were received, wherein a respective second substantially equivalent consecutively ordered portion of the range of identifiers is reallocated to each of the event processors in the active state. - View Dependent Claims (9, 10)
-
-
11. A computer-implemented method of monitoring components in a data environment, comprising:
under control of one or more computer systems configured with executable instructions, allocating a plurality of components in a data environment to be monitored by a set of event processors in a control environment, the plurality of components each having an identifier over a range of consecutively ordered identifiers, each event processor being allocated a respective first substantially equivalent portion of the range of identifiers for monitoring and operable to periodically send status messages to each component in the respective portion of the range of identifiers allocated to a respective event processor; causing a status message to be sent from each event processor in an active state to be received by other event processors in the set in order to indicate that the event processor sending the status message is in the active state; and upon identifying an event processor as being in an inactive state, reallocating the range of identifiers to the event processors in the active state from which status messages were received, wherein a respective second substantially equivalent consecutively ordered portion of the range of identifiers is reallocated to each of the event processors in the active state. - View Dependent Claims (12, 13, 14, 15)
-
16. A computer-implemented method of monitoring components in a data environment, comprising:
under control of one or more computer systems configured with executable instructions, allocating a plurality of components in a data environment to be monitored by a set of event processors in a control environment, the plurality of components each having an identifier over a range of identifiers, each event processor being allocated a respective first substantially equivalent consecutively ordered portion of the range of identifiers for monitoring; causing a status message to be sent from each event processor in an active state to be received by other event processors in the set to indicate that the event processor sending the status message is in the active state; storing information for an event processor from which a status message was not received to a job queue in a control environment; using the stored information to generate a workflow to restart the event processor from which the status message was not received or add a new event processor to the set of event processors; and reallocating the range of identifiers to each of the set of event processors when a newly started event processor is activated, wherein a respective second substantially equivalent consecutively ordered portion of the range of identifiers is reallocated to each of the event processors in the active state. - View Dependent Claims (17, 18)
-
19. A computer-implemented method of monitoring components in a data environment, comprising:
under control of one or more computer systems configured with executable instructions, allocating a plurality of components in a data environment to be monitored by a set of event processors in a control environment, the plurality of components each having an identifier over a range of identifiers, and each event processor being allocated a respective first substantially equivalent consecutively ordered portion of the range of identifiers for monitoring; sorting the identifiers and allocating the sorted identifiers substantially uniformly across the set of event processors; causing a status message to be sent from each event processor in an active state to be received by other event processors in the set to indicate that the event processor sending the status message is in the active state; and upon identifying an event processor as being in an inactive state, reallocating the range of identifiers to the event processors in the active state from which status messages were received, wherein a respective second substantially equivalent consecutively ordered portion of the range of identifiers is reallocated to each of the event processors in the active state. - View Dependent Claims (20, 21)
Specification