Coordinating the monitoring, management, and prediction of unintended changes within a grid environment
First Claim
1. A computer-implemented method for coordinating recovery from unintended change within a grid environment, comprising:
- enabling a grid environment comprising a plurality of resources from a plurality of computing systems each comprising at least one resource and communicatively connected over a network layer through a grid management system to share each said at least one resource through at least one web service layer atop at least one grid service layer implemented within an open grid services architecture, wherein said at least one grid service layer comprises a grid change controller, wherein a plurality of applications execute in an application layer atop said grid service layer;
monitoring, by said grid change controller, for a plurality of potential change indicators from a plurality of resource managers of said grid management system for one of multiple types of errors indicating a first error in at least one from among said network layer, said web service layer, and said application layer, a second error in a particular configuration of said plurality of resources within said grid environment, and a third error in a grid job executing within said grid environment, wherein each of said plurality of resource managers manages one from among a plurality of selections of said plurality of resources within said grid environment;
detecting, at said grid change controller within a grid environment, a particular potential change indicator from among said plurality of potential change indicators indicating an unintended change within said grid environment;
determining, by said grid change controller, a necessary response to said unintended change within said grid environment;
communicating, by said grid change controller, with at least one from among a plurality of independent managers available within said grid environment to resolve said unintended change, such that said grid change controller facilitates recovery from said unintended change within said grid environment to maintain performance requirements within said grid environment;
gathering, by said grid change controller, a plurality of indicators of actual performance by a plurality of grid jobs executing within said grid environment, said plurality of potential change indicators, and a plurality of recovery results managed by said grid change controller in resolving at least a selection of said plurality of potential change indicators including resolving said unintended change from said particular potential change indicator;
comparing said plurality of indicators of actual performance with at least one grid policy specifying an expected performance of said plurality of grid jobs executing within said grid environment and with said plurality of recovery results;
calculating at least one current reliability factor for said grid environment based on said plurality of indicators of actual performance achieved by said grid change controller in recovering from said plurality of potential change indicators based on said plurality of recover results in comparison with said expected performance; and
updating said grid policy specifying said expected performance to reflect said at least one current reliability factor.
1 Assignment
0 Petitions
Accused Products
Abstract
A grid change controller within a particular grid environment detects an unintended change within that grid environment. In particular, grid change controller monitors potential change indicators received from multiple disparate resource managers across the grid environment, where each resource manage manages a selection of resources within the grid environment. The grid change controller then determines a necessary response to the unintended change within the grid environment and communicates with at least one independent manager within the grid environment to resolve the unintended change, such that the grid environment to maintain performance requirements within the grid environment.
85 Citations
8 Claims
-
1. A computer-implemented method for coordinating recovery from unintended change within a grid environment, comprising:
-
enabling a grid environment comprising a plurality of resources from a plurality of computing systems each comprising at least one resource and communicatively connected over a network layer through a grid management system to share each said at least one resource through at least one web service layer atop at least one grid service layer implemented within an open grid services architecture, wherein said at least one grid service layer comprises a grid change controller, wherein a plurality of applications execute in an application layer atop said grid service layer; monitoring, by said grid change controller, for a plurality of potential change indicators from a plurality of resource managers of said grid management system for one of multiple types of errors indicating a first error in at least one from among said network layer, said web service layer, and said application layer, a second error in a particular configuration of said plurality of resources within said grid environment, and a third error in a grid job executing within said grid environment, wherein each of said plurality of resource managers manages one from among a plurality of selections of said plurality of resources within said grid environment; detecting, at said grid change controller within a grid environment, a particular potential change indicator from among said plurality of potential change indicators indicating an unintended change within said grid environment; determining, by said grid change controller, a necessary response to said unintended change within said grid environment; communicating, by said grid change controller, with at least one from among a plurality of independent managers available within said grid environment to resolve said unintended change, such that said grid change controller facilitates recovery from said unintended change within said grid environment to maintain performance requirements within said grid environment; gathering, by said grid change controller, a plurality of indicators of actual performance by a plurality of grid jobs executing within said grid environment, said plurality of potential change indicators, and a plurality of recovery results managed by said grid change controller in resolving at least a selection of said plurality of potential change indicators including resolving said unintended change from said particular potential change indicator; comparing said plurality of indicators of actual performance with at least one grid policy specifying an expected performance of said plurality of grid jobs executing within said grid environment and with said plurality of recovery results; calculating at least one current reliability factor for said grid environment based on said plurality of indicators of actual performance achieved by said grid change controller in recovering from said plurality of potential change indicators based on said plurality of recover results in comparison with said expected performance; and updating said grid policy specifying said expected performance to reflect said at least one current reliability factor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
Specification