×

Fault-tolerance and fault-containment models for zoning clustered application silos into continuous availability and high availability zones in clustered systems during recovery and maintenance

  • US 8,286,026 B2
  • Filed: 02/13/2012
  • Issued: 10/09/2012
  • Est. Priority Date: 06/29/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer program product recorded on one or more data storage media for use in a server cluster having plural nodes, comprising:

  • said one or more data storage media;

    program logic recorded on said data storage media for programming a data processing platform to operate as by;

    maintaining a set of active nodes that each run a software stack that includes a cluster management tier and a cluster application tier, said cluster application tier of said active nodes actively providing services on behalf of client applications;

    maintaining a set of spare nodes that each run a software stack that includes said cluster management tier and said cluster application tier, said cluster application tier of said spare nodes being continuously operational during steady-state cluster application transaction processing, but not actively providing transaction services on behalf of client applications prior to assuming an application workload from another node;

    dynamically logically defining first and second zones in said cluster in response to an active node membership change involving one or more active nodes departing from or being added to said cluster as a result of an active node failing or becoming unreachable or as a result of a maintenance operation involving an active node;

    said first zone being a fault tolerant zone comprising all of said active nodes that are operational;

    said second zone being a fault containment zone comprising all active nodes participating in said membership change and some number of said spare nodes in the event that said membership change involves a node departure;

    implementing fast recovery/maintenance and high cluster application availability in said fault containment zone during cluster recovery or maintenance by initiating application failover and application recovery protocols that are implemented by said cluster application and cluster management tiers of nodes in said fault containment zone following said active node membership change;

    maintaining continuous application cluster availability in said fault tolerant zone during cluster recovery or maintenance by continuing without interruption normal transactional application and related intra-cluster messaging protocols that were being implemented by said cluster application and cluster management tiers of nodes in said fault tolerant zone prior to said active node membership change; and

    said cluster management tier of nodes in said fault tolerant zone and said fault containment zone initiating cluster recovery protocols following said active node membership change, said cluster recovery protocols being transparent to said cluster application tier of nodes in said fault tolerant zone so as not to interfere with said normal transactional application and related intra-cluster messaging protocols implemented by nodes in said fault tolerant zone;

    whereby group integrity is maintained and transactional application communication messaging continues without interruption in nodes of said fault tolerant zone as cluster recovery is performed.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×