×

Coordinating the monitoring, management, and prediction of unintended changes within a grid environment

  • US 7,533,170 B2
  • Filed: 01/06/2005
  • Issued: 05/12/2009
  • Est. Priority Date: 01/06/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method for coordinating recovery from unintended change within a grid environment, comprising:

  • enabling a grid environment comprising a plurality of resources from a plurality of computing systems each comprising at least one resource and communicatively connected over a network layer through a grid management system to share each said at least one resource through at least one web service layer atop at least one grid service layer implemented within an open grid services architecture, wherein said at least one grid service layer comprises a grid change controller, wherein a plurality of applications execute in an application layer atop said grid service layer;

    monitoring, by said grid change controller, for a plurality of potential change indicators from a plurality of resource managers of said grid management system for one of multiple types of errors indicating a first error in at least one from among said network layer, said web service layer, and said application layer, a second error in a particular configuration of said plurality of resources within said grid environment, and a third error in a grid job executing within said grid environment, wherein each of said plurality of resource managers manages one from among a plurality of selections of said plurality of resources within said grid environment;

    detecting, at said grid change controller within a grid environment, a particular potential change indicator from among said plurality of potential change indicators indicating an unintended change within said grid environment;

    determining, by said grid change controller, a necessary response to said unintended change within said grid environment;

    communicating, by said grid change controller, with at least one from among a plurality of independent managers available within said grid environment to resolve said unintended change, such that said grid change controller facilitates recovery from said unintended change within said grid environment to maintain performance requirements within said grid environment;

    gathering, by said grid change controller, a plurality of indicators of actual performance by a plurality of grid jobs executing within said grid environment, said plurality of potential change indicators, and a plurality of recovery results managed by said grid change controller in resolving at least a selection of said plurality of potential change indicators including resolving said unintended change from said particular potential change indicator;

    comparing said plurality of indicators of actual performance with at least one grid policy specifying an expected performance of said plurality of grid jobs executing within said grid environment and with said plurality of recovery results;

    calculating at least one current reliability factor for said grid environment based on said plurality of indicators of actual performance achieved by said grid change controller in recovering from said plurality of potential change indicators based on said plurality of recover results in comparison with said expected performance; and

    updating said grid policy specifying said expected performance to reflect said at least one current reliability factor.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×