Method and system for a network management framework with redundant failover methodology

US 8,032,625 B2
Filed: 06/29/2001
Issued: 10/04/2011
Est. Priority Date: 06/29/2001
Status: Expired due to Fees

First Claim

Patent Images

1. A method for management of a distributed data processing system, the method comprising:

representing the distributed data processing system as a set of scopes, wherein a scope comprises a logical organization of network-related objects;

monitoring, by a computer, resources within the distributed data processing system using a set of distributed monitor controllers, wherein each distributed monitor controller is uniquely responsible for monitoring resources within different scopes;

in response to monitoring a set of resources, generating topology information associated with the set of resources by a first instance of a distributed monitor controller in the set of distributed monitor controllers;

in response to detecting a potential failure of the first instance of the distributed monitor controller, starting a second instance of the distributed monitor controller;

in response to monitoring the set of resources, generating topology information associated with the set of resources by the second instance of the distributed monitor controller; and

in response to a determination that generated topology information indicates assignment of overlapping scopes between the first instance of the distributed monitor controller and the second instance of the distributed monitor controller, determining a failure of the first instance of the distributed monitor controller based on a communication test.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, system, apparatus, and computer program product is presented for management of a distributed data processing system. Resources within the distributed data processing system are dynamically discovered, and the discovered resources are adaptively monitored using the network management framework. When the network management framework detects that certain components within the network management framework may have failed, new instances of these components are started. If duplicate components are later determined to be active concurrently, then a duplicate component is shutdown, thereby ensuring that at least one instance of these components is active at any given time. After certain failover events, a resource rediscovery process may occur, and a topology database containing previously stored information about discovered resources is resynchronized with resource information about rediscovered resources.

45 Citations

View as Search Results

27 Claims

1. A method for management of a distributed data processing system, the method comprising:
- representing the distributed data processing system as a set of scopes, wherein a scope comprises a logical organization of network-related objects;
  
  monitoring, by a computer, resources within the distributed data processing system using a set of distributed monitor controllers, wherein each distributed monitor controller is uniquely responsible for monitoring resources within different scopes;
  
  in response to monitoring a set of resources, generating topology information associated with the set of resources by a first instance of a distributed monitor controller in the set of distributed monitor controllers;
  
  in response to detecting a potential failure of the first instance of the distributed monitor controller, starting a second instance of the distributed monitor controller;
  
  in response to monitoring the set of resources, generating topology information associated with the set of resources by the second instance of the distributed monitor controller; and
  
  in response to a determination that generated topology information indicates assignment of overlapping scopes between the first instance of the distributed monitor controller and the second instance of the distributed monitor controller, determining a failure of the first instance of the distributed monitor controller based on a communication test.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 further comprising:
    - attempting the communication test with the first instance of the distributed monitor controller;
      
      in response to detecting a communication failure with the first instance of the distributed monitor controller, determining that the first instance of the distributed monitor controller is inactive; and
      
      in response to detecting a communication success with the first instance of the distributed monitor controller, determining that the first instance of the distributed monitor controller is active.
  - 3. The method of claim 2 further comprising:
    - in response to a determination that the first instance of the distributed monitor controller is active, requesting a shutdown of the second instance of the distributed monitor controller.
  - 4. The method of claim 3 further comprising:
    - updating the topology information that was generated by the second instance of the distributed monitor controller.
  - 5. The method of claim 2 further comprising:
    - in response to a determination that the first instance of the distributed monitor controller is inactive, updating the topology information that was generated by the first instance of the distributed monitor controller.
  - 6. The method of claim 5 further comprising:
    - discovering a status associated with each resource in the set of resources via the second distributed monitor controller; and
      
      rewriting topology information associated with each resource in the set of resources in accordance with the discovered status associated with each resource in the set of resources.
  - 7. The method of claim 5 further comprising:
    - resynchronizing a resource status database with the topology information using the second distributed monitor controller.
  - 8. The method of claim 7 further comprising:
    - determining a portion of the resource status database that is necessary for resynchronizing the topology information; and
      
      retrieving only the determined portion of the resource status database.
  - 9. The method of claim 1 further comprising:
    - attempting the communication test with an object request broker (ORB) that supports the first instance of the distributed monitor controller;
      
      in response to detecting a communication failure with the ORB that supports the first instance of the distributed monitor controller, determining that the first instance of the distributed monitor controller is inactive; and
      
      requesting a shutdown of the first instance of the distributed monitor controller.

10. An apparatus for management of a distributed data processing system, the apparatus comprising:
- means for representing the distributed data processing system as a set of scopes, wherein a scope comprises a logical organization of network-related objects;
  
  means for monitoring resources within the distributed data processing system using a set of distributed monitor controllers, wherein each distributed monitor controller is uniquely responsible for monitoring resources within different scopes;
  
  means for generating topology information associated with a set of resources by a first instance of a distributed monitor controller in the set of distributed monitor controllers in response to monitoring the set of resources;
  
  means for starting a second instance of the distributed monitor controller in response to detecting a potential failure of the first instance of the distributed monitor controller;
  
  means for generating topology information associated with the set of resources by the second instance of the distributed monitor controller in response to monitoring the set of resources; and
  
  means for determining a failure of the first instance of the distributed monitor controller based on a communication test in response to a determination that generated topology information indicates assignment of overlapping scopes between the first instance of the distributed monitor controller and the second instance of the distributed monitor controller.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The apparatus of claim 10 further comprising:
    - means for attempting the communication test with the first instance of the distributed monitor controller;
      
      means for determining that the first instance of the distributed monitor controller is inactive in response to detecting a communication failure with the first instance of the distributed monitor controller; and
      
      means for determining that the first instance of the distributed monitor controller is active in response to detecting a communication success with the first instance of the distributed monitor controller.
  - 12. The apparatus of claim 11 further comprising:
    - means for requesting a shutdown of the second instance of the distributed monitor controller in response to a determination that the first instance of the distributed monitor controller is active.
  - 13. The apparatus of claim 12 further comprising:
    - means for updating the topology information that was generated by the second instance of the distributed monitor controller.
  - 14. The apparatus of claim 11 further comprising:
    - means for updating the topology information that was generated by the first instance of the distributed monitor controller in response to a determination that the first instance of the distributed monitor controller is inactive.
  - 15. The apparatus of claim 14 further comprising:
    - means for discovering a status associated with each resource in the set of resources via the second distributed monitor controller; and
      
      means for rewriting topology information associated with each resource in the set of resources in accordance with the discovered status associated with each resource in the set of resources.
  - 16. The apparatus of claim 14 further comprising:
    - means for resynchronizing a resource status database with the topology information using the second distributed monitor controller.
  - 17. The apparatus of claim 16 further comprising:
    - means for determining a portion of the resource status database that is necessary for resynchronizing the topology information; and
      
      means for retrieving only the determined portion of the resource status database.
  - 18. The apparatus of claim 10 further comprising:
    - mean for attempting the communication test with an object request broker (ORB) that supports the first instance of the distributed monitor controller; and
      
      means for determining that the first instance of the distributed monitor controller is inactive in response to detecting a communication failure with the ORB that supports the first instance of the distributed monitor controller.

19. A computer program product on a non-transitory computer readable medium for use in managing a distributed data processing system, the computer program product comprising:
- instructions for representing the distributed data processing system as a set of scopes, wherein a scope comprises a logical organization of network-related objects;
  
  instructions for monitoring resources within the distributed data processing system using a set of distributed monitor controllers, wherein each distributed monitor controller is uniquely responsible for monitoring resources within different scopes;
  
  instructions for generating topology information associated with a set of resources by a first instance of a distributed monitor controller in the set of distributed monitor controllers in response to monitoring the set of resources;
  
  instructions for starting a second instance of the distributed monitor controller in response to detecting a potential failure of the first instance of the distributed monitor controller;
  
  instructions for generating topology information associated with the set of resources by the second instance of the distributed monitor controller in response to monitoring the set of resources; and
  
  instructions for determining a failure of the first instance of the distributed monitor controller based response to a determination that generated topology information indicates assignment of overlapping scopes between the first instance of the distributed monitor controller and the second instance of the distributed monitor controller.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
- - 20. The computer program product of claim 19 further comprising:
    - instructions for attempting the communication test with the first instance of the distributed monitor controller;
      
      instructions for determining that the first instance of the distributed monitor controller is inactive in response to detecting a communication failure with the first instance of the distributed monitor controller; and
      
      instructions for determining that the first instance of the distributed monitor controller is active in response to detecting a communication success with the first instance of the distributed monitor controller.
  - 21. The computer program product of claim 20 further comprising:
    - instructions for requesting a the shutdown of the second instance of the distributed monitor controller in response to a determination that the first instance of the distributed monitor controller is active.
  - 22. The computer program product of claim 21 further comprising:
    - instructions for updating the topology information that was generated by the second instance of the distributed monitor controller.
  - 23. The computer program product of claim 22 further comprising:
    - instructions for discovering a status associated with each resource in the set of resources via the second distributed monitor controller; and
      
      instructions for rewriting topology information associated with each resource in the set of resources in accordance with the discovered status associated with each resource in the set of resources.
  - 24. The computer program product of claim 22 further comprising:
    - instructions for resynchronizing a resource status database with the topology information using the second distributed monitor controller.
  - 25. The computer program product of claim 24 further comprising:
    - instructions for determining a portion of the resource status database that is necessary for resynchronizing the topology information; and
      
      instructions for retrieving only the determined portion of the resource status database.
  - 26. The computer program product of claim 20 further comprising:
    - instructions for updating the topology information that was generated by the first instance of the distributed monitor controller in response to a determination that the first instance of the distributed monitor controller is inactive.
  - 27. The computer program product of claim 26 further comprising:
    - instructions for attempting the communication test with an object request broker (ORB) that supports the first instance of the distributed monitor controller; and
      
      instructions for determining that the first instance of the distributed monitor controller is inactive in response to detecting a communication failure with the ORB that supports the first instance of the distributed monitor controller.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Benfield, Jason, Hsu, Oliver Yehung, Yarsa, Julianne, Ullmann, Lorin Evan
Primary Examiner(s)
OSMAN, RAMY M

Application Number

US09/895,085
Publication Number

US 20030009551A1
Time in Patent Office

3,749 Days
Field of Search

709/201, 709/223, 709/224, 709/230, 718/1, 718/100, 718/2, 714/4.1
US Class Current

709/224
CPC Class Codes

H04L 41/0213   Standardised network manage...

H04L 41/046   comprising network manageme...

H04L 41/06   Management of faults, event...

H04L 41/0661   by reconfiguring faulty ent...

H04L 41/12   Discovery or management of ...

Method and system for a network management framework with redundant failover methodology

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

45 Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for a network management framework with redundant failover methodology

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links