Apparatus and method for building distributed fault-tolerant/high-availability computed applications
First Claim
1. A distributed processing computer apparatus for use in systems, the apparatus comprising:
- a plurality of processes executing on at least one processor;
at least one application executing in a pure distributed mode where said application is distributed in an active condition among more than one of said processes on said processors;
a system controller for controlling system activation and initial load distribution;
a router for providing communications between at least one said application and other applications independent of application locations;
an ADSM for providing distributed functionality in said application; and
an ALDM for distributing incoming events to said application.
1 Assignment
0 Petitions
Accused Products
Abstract
Software architecture for developing distributed fault-tolerant systems independent of the underlying hardware architecture and operating system. Systems built using architecture components are scalable and allow a set of computer applications to operate in fault-tolerant/high-availability mode, distributed processing mode, or many possible combinations of distributed and fault-tolerant modes in the same system without any modification to the architecture components. The software architecture defines system components that are modular and address problems in present systems. The architecture uses a System Controller, which controls system activation, initial load distribution, fault recovery, load redistribution, and system topology, and implements system maintenance procedures. An Application Distributed Fault-Tolerant/High-Availability Support Module (ADSM) enables an applications( ) to operate in various distributed fault-tolerant modes. The System Controller uses ADSM'"'"'s well-defined API to control the state of the application in these modes. The Router architecture component provides transparent communication between applications during fault recovery and topology changes. An Application Load Distribution Module (ALDM) component distributes incoming external events towards the distributed application. The architecture allows for a Load Manager, which monitors load on various copies of the application and maximizes the hardware usage by providing dynamic load balancing. The architecture also allows for a Fault Manager, which performs fault detection, fault location, and fault isolation, and uses the System Controller'"'"'s API to initiate fault recovery. These architecture components can be used to achieve a variety of distributed processing high-availability system configurations, which results in a reduction of cost and development time.
113 Citations
99 Claims
-
1. A distributed processing computer apparatus for use in systems, the apparatus comprising:
-
a plurality of processes executing on at least one processor;
at least one application executing in a pure distributed mode where said application is distributed in an active condition among more than one of said processes on said processors;
a system controller for controlling system activation and initial load distribution;
a router for providing communications between at least one said application and other applications independent of application locations;
an ADSM for providing distributed functionality in said application; and
an ALDM for distributing incoming events to said application. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A fault tolerant computer apparatus for use in systems, the apparatus comprising:
-
a plurality of processes executing on at least one processor;
at least one application executing in a pure fault At tolerant mode where said application is in an active condition on one said process and in a standby condition on another said process on said processors;
a system controller for controlling system activation and failure recovery;
a router for providing communications between at least one said application and other applications independent of application locations; and
an ADSM for providing fault tolerant functionality in said application and wherein said application is represented by a single resource set. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47)
-
-
48. A distributed processing, fault tolerant computer apparatus for use in systems, the apparatus comprising:
-
a plurality of processes executing on at least one processor;
at least one application executing in a distributed fault tolerant mode where said application is in an active condition on more than one of said processes and is in a standby condition on at least one of said processes on said processors;
a system controller for controlling system fit activation, failure recovery and initial load distribution;
a router for providing communications between at least one said application and other applications independent of application locations;
an ADSM for providing distributed fault tolerant functionality in said application; and
an ALDM for distributing incoming events to said application. - View Dependent Claims (49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74)
-
-
75. A distributed processing, computer apparatus for use in systems, the apparatus comprising:
-
a plurality of processes executing on at least one processor;
at least one application executing in a pure distributed mode where said application is distributed in an active condition among more than one of said processes on said processors;
a system controller for controlling system activation and initial load distribution;
a router for providing communications between at least one said application and other applications independent of application locations;
an update module for providing distributed functionality in said application; and
a load distributor for distributing incoming events to said application.
-
-
76. A fault tolerant computer apparatus for use in systems, the apparatus comprising:
-
a plurality of processes executing on at least one processor;
at least one application executing in a pure fault tolerant mode where said application is in an active condition on one said process and in a standby condition on another said process on said processors;
a system controller for controlling system activation and failure recovery;
a router for providing communications between at least one said application and other applications independent of application locations; and
an update module for providing fault tolerant functionality in said application and wherein said application is represented by a single reserved resource set.
-
-
77. A distributed processing, fault tolerant computer apparatus for use in systems, the apparatus comprising:
-
a plurality of processes executing at least one processor;
at least one application executing in a distributed fault tolerant mode where said application is in an active condition on more than one of said processes and is in a standby condition on at least one of said processes on said processors;
a system controller for controlling system activation, failure recovery and initial load distribution;
a router for providing communications between at least one said application and other applications independent of application locations;
an update module for providing distributed fault tolerant functionality in said application; and
a load distributor for distributing incoming events to said application.
-
-
78. A fault tolerant, distributed processing, computer apparatus for use in systems, the apparatus comprising:
-
a plurality of processes, executing on at least one processor;
said processes executing an application in the same mode as at least one other application or in a mode different from said one other application, said same and different modes being;
a) a pure distributed mode where an application is distributed among said processes in an active condition;
b) a pure fault-tolerant mode where an application executes in at least one process in an active condition and in at least one process in a standby condition; and
c) a distributed fault-tolerant mode where an application is distributed on multiple processes in an active condition and on at least one process in a standby condition.
-
-
79. A method in a computer apparatus for fault tolerant and distributed processing of at least one application in a plurality of processes running on at least one processor, the method comprising the steps of:
-
executing said application in a distributed fault tolerant mode wherein said application is distributed in an active condition among more than one process and is in standby condition on at least one said process on said processors;
providing a plurality of resource sets as units of distribution of said application; and
a master critical resource set modifying shared data in said application and updating to a shadow resource set of said application on said processes and an active non-critical resource set modifying private data in said application and updating to a standby resource set of said application on another said process. - View Dependent Claims (80, 81, 82, 83, 84, 85, 86)
-
-
87. A method in a computer apparatus for distributed processing of at least one application in a plurality of processes running on at least one processor;
- the method comprising the steps of;
executing said application in a pure distributed mode wherein said application is distributed in an active condition among more than one process;
providing a plurality of resource sets as units of distribution of said application;
a master critical resource set modifying shared data in said application and updating to a shadow resource set of said application on said processes and an active non-critical resource set modifying private data in said application. - View Dependent Claims (88, 89, 90, 91, 92, 93)
- the method comprising the steps of;
-
94. A method in a computer apparatus for fault tolerant processing of at least one application in a plurality of processes running on at least one processor;
- the method comprising the steps of;
executing said application in a fault tolerant mode wherein said application is in an active condition on one process and is in standby condition on another said process on said processors;
representing said application by a single resource set; and
an active single resource set modifying private data in said application and updating to a standby resource set of said application on another said process. - View Dependent Claims (95, 96, 97, 98, 99)
- the method comprising the steps of;
Specification