System and method for providing highly-reliable coordination of intelligent agents in a distributed computing system
First Claim
1. In a distributed computing system including at least one host computer system, a highly-reliable system for performing tasks, comprising:
- a plurality of agents, each agent in the plurality of agents having a host system and being capable of pursuing a goal autonomous from the host system associated with the agent, the goal being a solution to a problem defined by one agent in the plurality of agents or the host system associated with the one agent, wherein a first agent in the plurality of agents is associated with at least one system, including a first host system, and is operative for communicating with component in the distributed computing system to receive instructions, information, and requests from the component, the first agent being further operative to receive the goal from the component, the first agent being capable of interacting with a second agent in the plurality of agents using inter-agent conversation facilities in furtherance of the goal through the transmission of a message from the first agent to the second agent, the first agent being still further operative for planning a session with other agents in the plurality of agents for pursuing the goal; and
a fault tolerance object resident at the second agent and operative for identifying a fault in the interaction by testing information received from the first agent during the interaction to determine whether the information is inconsistent with information stored in the second agent that describes an expected behavior of the first agent or the systems associated with the first agent, the fault tolerance object further operative to initiate a fault tolerance procedure operative to identify a cause of the fault and, when necessary, to request the first agent to re-express and re-transmit the message from the first agent to the second agent.
3 Assignments
0 Petitions
Accused Products
Abstract
The application of a fault tolerance technique to the intelligent agent technology to create a highly-reliable distributed computing system. The present invention relates to the merger of software fault tolerance techniques to cooperative intelligent agents to provide highly reliable coordination of interactions between computer systems, even when data is corrupt, when available information is incomplete, or when synchronization of the computer systems is imperfect. Agents engaged in an interaction exchange information. Received information is acceptance tested to determine if the information indicates the occurrence of a fault. If the information is outside a range of expected values, or otherwise does not take the form of expected information, a fault is indicated. A fault tolerance technique is employed to overcome the fault. One such technique is the retry block software fault technique. Re-expression and re-transmission of the information may be requested.
66 Citations
20 Claims
-
1. In a distributed computing system including at least one host computer system, a highly-reliable system for performing tasks, comprising:
-
a plurality of agents, each agent in the plurality of agents having a host system and being capable of pursuing a goal autonomous from the host system associated with the agent, the goal being a solution to a problem defined by one agent in the plurality of agents or the host system associated with the one agent, wherein a first agent in the plurality of agents is associated with at least one system, including a first host system, and is operative for communicating with component in the distributed computing system to receive instructions, information, and requests from the component, the first agent being further operative to receive the goal from the component, the first agent being capable of interacting with a second agent in the plurality of agents using inter-agent conversation facilities in furtherance of the goal through the transmission of a message from the first agent to the second agent, the first agent being still further operative for planning a session with other agents in the plurality of agents for pursuing the goal; and
a fault tolerance object resident at the second agent and operative for identifying a fault in the interaction by testing information received from the first agent during the interaction to determine whether the information is inconsistent with information stored in the second agent that describes an expected behavior of the first agent or the systems associated with the first agent, the fault tolerance object further operative to initiate a fault tolerance procedure operative to identify a cause of the fault and, when necessary, to request the first agent to re-express and re-transmit the message from the first agent to the second agent. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. In a distributed computing system including a plurality of agents, each agent in the plurality of agents operative for executing a task in furtherance of a goal autonomous from a host system associated with each agent, the goal being a solution to a problem defined by one agent in the plurality of agents or the host system associated with the one agent, a method for enhancing the reliability of an interaction between two agents in the plurality of agents, comprising:
-
initiating a session between a first agent in the plurality of agents and a second agent in the plurality of agents, the first agent and the second agent each including a portion of a distributed fault tolerance structure containing data conditioning, testing, and control functions to carry out fault tolerance operations between the first agent and the second agent, the session having the goal;
providing to the first agent information related to the second agent, the information related to the second agent comprising information that describes an expected behavior of the second agent;
at the first agent, acceptance testing information related to the session to detect whether the information related to the session is inconsistent with the expected behavior of the second agent, the failure of the acceptance test being indicative of a fault in the session; and
in response to a failure of the acceptance test, initiating a fault tolerance procedure making use of the distributed fault tolerance structure to diagnose a cause of the fault, to create a fault recovery plan, and to execute the fault recovery plan.
-
-
17. A computer-readable medium in a distributed computing system having computer-executable instructions for enhancing the reliability of interactions between agents in the plurality of agents, comprising:
-
monitoring an interactive step of an interaction between a first agent in a plurality of agents in the distributed computing system and a second agent in the plurality of agents in the distribute computing system, each agent in the plurality of agents being operative for executing a task in furtherance of a goal, each agent being further operative to execute the task independent of a host system associated with the agent, the goal being a solution to a problem defined by one agent in the plurality of agents or the host system associated with the one agent, the interactive step involving a use of information sent from the first agent to the second agent;
testing the validity of the interactive step being performed in the interaction through the use of fault tolerance techniques distributed among at least the first agent and the second agent by comparing the information sent from the first agent to the second agent with other information that describes an expected behavior of the first agent or the host system associated with the first agent;
informing the first agent and the second agent that the interactive step being performed is invalid; and
assisting the first agent and a second agent to reperform the interactive step with a re-expressed version of the information sent from the first agent to the second agent. - View Dependent Claims (18, 19, 20)
-
Specification