Apparatus and method for building distributed fault-tolerant/high-availability computer applications
1 Assignment
0 Petitions
Accused Products
Abstract
Software architecture for developing distributed fault-tolerant systems independent of the underlying hardware architecture and operating system. Systems built using architecture components are scalable and allow a set of computer applications to operate in fault-tolerant/high-availability mode, distributed processing mode, or many possible combinations of distributed and fault-tolerant modes in the same system without any modification to the architecture components. The software architecture defines system components that are modular and address problems in present systems. The architecture uses a System Controller, which controls system activation, initial load distribution, fault recovery, load redistribution, and system topology, and implements system maintenance procedures. An Application Distributed Fault-Tolerant/High-Availability Support Module (ADSM) enables an application(s) to operate in various distributed fault-tolerant modes. The System Controller uses ADSM'"'"'s well-defined API to control the state of the application in these modes. The Router architecture component provides transparent communication between applications during fault recovery and topology changes. An Application Load Distribution Module (ALDM) component distributes incoming external events towards the distributed application. The architecture allows for a Load Manager, which monitors load on various copies of the application and maximizes the hardware usage by providing dynamic load balancing. The architecture also allows for a Fault Manager, which performs fault detection, fault location, and fault isolation, and uses the System Controller'"'"'s API to initiate fault recovery. These architecture components can be used to achieve a variety of distributed processing high-availability system configurations, which results in a reduction of cost and development time.
-
Citations
116 Claims
-
1-99. -99. (canceled)
-
100. A distributed processing system comprising:
-
an application to execute a process in a pure distributed mode, wherein the process is associated with an incoming event and is mapped to a resource set;
a router to route communications between the application and another application independent of location of the applications;
an update module to provide distributed functionality in the application; and
a load distributor to distribute the process to the application. - View Dependent Claims (101, 102, 103)
-
-
104. A fault tolerant system comprising:
-
an application to execute a process in a pure fault tolerant mode, wherein the process is associated with an incoming event and is mapped to a resource set;
a router to route communications between the application and another application independent of location of the applications; and
an update module to provide fault tolerant functionality in the application, wherein the application uses the resource set to execute the process. - View Dependent Claims (105, 106, 107, 108)
-
-
109. A distributed processing, fault tolerant system comprising:
-
an application to execute a process in a distributed fault tolerant mode, wherein the process is associated with an incoming event and is mapped to a resource set;
a router to route communications between the application and another application independent of location of the applications;
an update module to provide distributed fault tolerant functionality in the application; and
a load distributor to distribute the process to the application. - View Dependent Claims (110, 111, 112, 113)
-
-
114. A method for fault tolerant and distributed processing, comprising:
-
mapping a process associated with an incoming event to a resource set, the resource set including a critical and a non-critical resource set;
assigning the resource set to an application, wherein the application executes the process in a distributed fault tolerant mode;
associating private data with the non-critical resource set, the private data to be used by the application to execute the process in a fault tolerant mode; and
associating shared data with the critical resource set, the shared data to be used by the application to execute the process in a distributed mode. - View Dependent Claims (115, 116)
-
Specification