Methods and arrangements for distributed diagnosis in distributed systems using belief propagation
First Claim
1. A method for affording collaborative problem determination in a distributed system, said method comprising the steps of:
- appending at least one measurement component to a distributed system;
employing the at least one measurement component to obtain system status information;
sharing system status information among system nodes; and
diagnosing a problem in the distributed system based on shared system status information.
1 Assignment
0 Petitions
Accused Products
Abstract
In the context of problems associated with self-healing in autonomic computer systems, and particularly, the problem of fast and efficient real-time diagnosis in large-scale distributed systems, a “divide-and-conquer” approach to diagnostic tasks is disclosed. Preferably, parallel (i.e., multi-thread) and distributed (i.e., multi-machine) architectures are used, whereby the diagnostic task is preferably divided into subtasks and distributed to multiple diagnostic engines that collaborate with each other in order to reach a final diagnosis. Each diagnostic engine is preferably responsible for some subset of system components (its “region”) and performs the diagnosis using all available observation about these components. When the regions do not intersect, the diagnostic task is trivially parallelized.
20 Citations
20 Claims
-
1. A method for affording collaborative problem determination in a distributed system, said method comprising the steps of:
-
appending at least one measurement component to a distributed system; employing the at least one measurement component to obtain system status information; sharing system status information among system nodes; and diagnosing a problem in the distributed system based on shared system status information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A apparatus for affording collaborative problem determination in a distributed system, said apparatus comprising:
-
an arrangement for appending at least one measurement component to a distributed system; an arrangement for employing the at least one measurement component to obtain system status information; an arrangement for sharing system status information among system nodes; and an arrangement for diagnosing a problem in the distributed system based on shared system status information. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for affording collaborative problem determination in a distributed system, said method comprising the steps of:
-
appending at least one measurement component to a distributed system; employing the at least one measurement component to obtain system status information; sharing system status information among system nodes; and diagnosing a problem in the distributed system based on shared system status information.
-
Specification