Coordinating the monitoring, management, and prediction of unintended changes within a grid environment

US 7,533,170 B2
Filed: 01/06/2005
Issued: 05/12/2009
Est. Priority Date: 01/06/2005
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method for coordinating recovery from unintended change within a grid environment, comprising:

enabling a grid environment comprising a plurality of resources from a plurality of computing systems each comprising at least one resource and communicatively connected over a network layer through a grid management system to share each said at least one resource through at least one web service layer atop at least one grid service layer implemented within an open grid services architecture, wherein said at least one grid service layer comprises a grid change controller, wherein a plurality of applications execute in an application layer atop said grid service layer;

monitoring, by said grid change controller, for a plurality of potential change indicators from a plurality of resource managers of said grid management system for one of multiple types of errors indicating a first error in at least one from among said network layer, said web service layer, and said application layer, a second error in a particular configuration of said plurality of resources within said grid environment, and a third error in a grid job executing within said grid environment, wherein each of said plurality of resource managers manages one from among a plurality of selections of said plurality of resources within said grid environment;

detecting, at said grid change controller within a grid environment, a particular potential change indicator from among said plurality of potential change indicators indicating an unintended change within said grid environment;

determining, by said grid change controller, a necessary response to said unintended change within said grid environment;

communicating, by said grid change controller, with at least one from among a plurality of independent managers available within said grid environment to resolve said unintended change, such that said grid change controller facilitates recovery from said unintended change within said grid environment to maintain performance requirements within said grid environment;

gathering, by said grid change controller, a plurality of indicators of actual performance by a plurality of grid jobs executing within said grid environment, said plurality of potential change indicators, and a plurality of recovery results managed by said grid change controller in resolving at least a selection of said plurality of potential change indicators including resolving said unintended change from said particular potential change indicator;

comparing said plurality of indicators of actual performance with at least one grid policy specifying an expected performance of said plurality of grid jobs executing within said grid environment and with said plurality of recovery results;

calculating at least one current reliability factor for said grid environment based on said plurality of indicators of actual performance achieved by said grid change controller in recovering from said plurality of potential change indicators based on said plurality of recover results in comparison with said expected performance; and

updating said grid policy specifying said expected performance to reflect said at least one current reliability factor.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A grid change controller within a particular grid environment detects an unintended change within that grid environment. In particular, grid change controller monitors potential change indicators received from multiple disparate resource managers across the grid environment, where each resource manage manages a selection of resources within the grid environment. The grid change controller then determines a necessary response to the unintended change within the grid environment and communicates with at least one independent manager within the grid environment to resolve the unintended change, such that the grid environment to maintain performance requirements within the grid environment.

85 Citations

View as Search Results

8 Claims

1. A computer-implemented method for coordinating recovery from unintended change within a grid environment, comprising:
- enabling a grid environment comprising a plurality of resources from a plurality of computing systems each comprising at least one resource and communicatively connected over a network layer through a grid management system to share each said at least one resource through at least one web service layer atop at least one grid service layer implemented within an open grid services architecture, wherein said at least one grid service layer comprises a grid change controller, wherein a plurality of applications execute in an application layer atop said grid service layer;
  
  monitoring, by said grid change controller, for a plurality of potential change indicators from a plurality of resource managers of said grid management system for one of multiple types of errors indicating a first error in at least one from among said network layer, said web service layer, and said application layer, a second error in a particular configuration of said plurality of resources within said grid environment, and a third error in a grid job executing within said grid environment, wherein each of said plurality of resource managers manages one from among a plurality of selections of said plurality of resources within said grid environment;
  
  detecting, at said grid change controller within a grid environment, a particular potential change indicator from among said plurality of potential change indicators indicating an unintended change within said grid environment;
  
  determining, by said grid change controller, a necessary response to said unintended change within said grid environment;
  
  communicating, by said grid change controller, with at least one from among a plurality of independent managers available within said grid environment to resolve said unintended change, such that said grid change controller facilitates recovery from said unintended change within said grid environment to maintain performance requirements within said grid environment;
  
  gathering, by said grid change controller, a plurality of indicators of actual performance by a plurality of grid jobs executing within said grid environment, said plurality of potential change indicators, and a plurality of recovery results managed by said grid change controller in resolving at least a selection of said plurality of potential change indicators including resolving said unintended change from said particular potential change indicator;
  
  comparing said plurality of indicators of actual performance with at least one grid policy specifying an expected performance of said plurality of grid jobs executing within said grid environment and with said plurality of recovery results;
  
  calculating at least one current reliability factor for said grid environment based on said plurality of indicators of actual performance achieved by said grid change controller in recovering from said plurality of potential change indicators based on said plurality of recover results in comparison with said expected performance; and
  
  updating said grid policy specifying said expected performance to reflect said at least one current reliability factor.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer-implemented method according to claim 1, wherein determining a necessary response to said unintended change within said grid environment further comprises:
    - determining whether a selection from among said plurality of independent managers is enabled to control recovery of said grid environment from said unintended change.
  - 3. The computer-implemented method according to claim 1, wherein determining a necessary response to said unintended change within said grid environment further comprises:
    - determining whether said unintended change will effect a particular grid job, from among a plurality of grid jobs executing within said grid environment, executing within a specific execution environment within said grid environment.
  - 4. The computer-implemented method according to claim 1, wherein determining a necessary response to said unintended change within said grid environment further comprises:
    - determining whether to request that a system administrator attempt to recover said grid environment from said unintended change.
  - 5. The computer-implemented method according to claim 1, wherein determining a necessary response to said unintended change within said grid environment further comprises:
    - determining whether to request a change of resources to which a particular grid job effected by said unintended change are routed.
  - 6. The computer-implemented method according to claim 1, wherein communicating with at least one from among a plurality of independent managers available within said grid environment to resolve said unintended change, further comprises:
    - initiating a recovery workflow for recovery from said unintended change within said at least one independent manager from among said plurality of independent managers, wherein said at least one independent manager automatically controls said recovery within a particular execution environment within said grid environment.
  - 7. The computer-implemented method according to claim 1, wherein communicating with at least one from among a plurality of independent managers available within said grid environment to resolve said unintended change, further comprises:
    - coordinating with said at least one independent manager to direct recovery from said unintended change by said at least one independent manager.
  - 8. The computer-implemented method according to claim 1, wherein communicating with at least one from among a plurality of independent managers available within said grid environment to resolve said unintended change, further comprises:
    - coordinating with a grid job router to renegotiate for an allocation of a new selection of resources for processing a particular grid job effected by said unintended change.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Hamilton, Rick Allen II, Seaman, James W., Joseph, Joshy, Fellenstein, Craig William
Primary Examiner(s)
WON, MICHAEL YOUNG

Application Number

US11/031,541
Publication Number

US 20060150159A1
Time in Patent Office

1,587 Days
Field of Search

709/224, 709/203, 709/220
US Class Current

709/224
CPC Class Codes

G06F 9/5072 Grid computing

Coordinating the monitoring, management, and prediction of unintended changes within a grid environment

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

85 Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Coordinating the monitoring, management, and prediction of unintended changes within a grid environment

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

85 Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links