Monitoring and control engine for multi-tiered service-level management of distributed web-application servers
First Claim
1. A distributed monitor and control engine comprising:
- a service-level-objective (SLO) agent, receiving measurements of an SLO objective for a web service to a web user, the measurements of the SLO objective indicating service quality for the web user accessing the web service at a web site, the SLO agent for adjusting resources at the web site to improve the measurements of the SLO objective;
a service agent, coupled to the SLO agent, for monitoring and controlling one or more tiers at the web site, wherein a request from the web user passes through a plurality of tiers, each tier having a plurality of service components each capable of performing a tier service for the request, the request being processed by performing a series of tier services of different tiers; and
local agents, running on nodes containing the service components, each local agent for monitoring status of a service component and for adjusting local computing resources available to the service component in response to commands from the service agent, each local agent reporting status to the service agent, wherein the SLO agent uses the service agent and local agents to adjust resources at the web site to improve measurements of the SLO objective.
23 Assignments
0 Petitions
Accused Products
Abstract
A web site provides services to uses over the Internet. End-user service-level objectives (SLOs) such as availability and performance are measured and reported to a SLO agent. User requests pass through several tiers at the web site, such as a firewall tier, a web-server tier, an application-server tier, and a database-server tier. Each tier has several redundant service components that can process requests for that tier. Local agents, operating with any local resource managers, monitor running service components and report to a service agent. Node monitors also monitor network-node status and report to the service agent. When a node or service component fails, the service agent attempts to restart it using the local agent, or replicates the service component to other nodes. When the SLO agent determines that a SLO is not being met, it instructs the service agent to replicate more of the constraining service components or increase resources.
-
Citations
20 Claims
-
1. A distributed monitor and control engine comprising:
-
a service-level-objective (SLO) agent, receiving measurements of an SLO objective for a web service to a web user, the measurements of the SLO objective indicating service quality for the web user accessing the web service at a web site, the SLO agent for adjusting resources at the web site to improve the measurements of the SLO objective;
a service agent, coupled to the SLO agent, for monitoring and controlling one or more tiers at the web site, wherein a request from the web user passes through a plurality of tiers, each tier having a plurality of service components each capable of performing a tier service for the request, the request being processed by performing a series of tier services of different tiers; and
local agents, running on nodes containing the service components, each local agent for monitoring status of a service component and for adjusting local computing resources available to the service component in response to commands from the service agent, each local agent reporting status to the service agent, wherein the SLO agent uses the service agent and local agents to adjust resources at the web site to improve measurements of the SLO objective. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented method for monitoring and controlling a web site to meet a service-level objective (SLO) of a service having multiple tiers of service components, the method comprising:
-
when a SLO agent determines that an availability SLO is not being met;
commanding a service agent for a failing tier to replicate a service component for the failing tier that is below a tier-performance baseline and causing the SLO to not be met to increase a number of service components for the failing tier; and
sending an alarm from the service agent to the SLO agent indicating an action taken;
when a SLO agent determines that a performance SLO is not being met;
sending a message from the SLO agent to a service agent for a low-performing tier;
sending a command from the service agent to a local agent running a service component for the low-performing tier;
the local agent attempting to shift resources to the service component for the low-performing tier from lower-priority services running on a local node controlled by the local agent;
when the local agent is not able to shift resources, replicating the service component to a target node to increase a number of service components for the low-performing tier; and
sending an alarm signal from the service agent to the SLO agent to report an action taken, whereby availability and performance SLO violations are acted on by the SLO agent instructing the service and local agents to shift resources or replicate service components of a tier causing the violation. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A computer-program product comprising:
a computer-usable medium having computer-readable program code means embodied therein for controlling and monitoring service-level objectives, the computer-readable program code means in the computer-program product comprising;
network connection means for transmitting and receiving external requests for a service;
first tier means for receiving and partially processing external requests for the service having a service-level objective (SLO), the first tier means having a plurality of first service components each able to partially process a request when other first service components are not operational;
second tier means for receiving and partially processing requests from the first tier means, the second tier means having a plurality of second service components each able to partially process a request when other second service components are not operational;
third tier means for receiving and partially processing requests from the second tier means, the third tier means having a plurality of third service components each able to partially process a request when other third service components are not operational;
first local agent means, running on nodes for running the first service components, for monitoring and controlling the first service components of the first tier means;
second local agent means, running on nodes for running the second service components, for monitoring and controlling the second service components of the second tier means;
third local agent means, running on nodes for running the third service components, for monitoring and controlling the third service components of the third tier means;
SLO agent means, coupled to receive SLO measurements, for comparing an SLO measurement to a goal for a service and signaling a SLO violation when the goal is not met by the SLO measurement;
first service agent means, coupled to the first local agent means, for instructing the first local agent means to adjust resources to increase performance of the first service components in response to a message from the SLO agent means signaling the SLO violation when the SLO violation is caused by the first service components of the first tier means;
second service agent means, coupled to the second local agent means, for instructing the second local agent means to adjust resources to increase performance of the second service components in response to a message from the SLO agent means signaling the SLO violation when the SLO violation is caused by the second service components of the second tier means; and
third service agent means, coupled to the third local agent means, for instructing the third local agent means to adjust resources to increase performance of the third service components in response to a message from the SLO agent means signaling the SLO violation when the SLO violation is caused by the third service components of the third tier means, whereby multiple tiers of service components are controlled. - View Dependent Claims (17, 18, 19, 20)
Specification