Invariants-based learning method and system for failure diagnosis in large scale computing systems
First Claim
1. A method for diagnosing a detected failure in a computer system, the method comprising:
- comparing, in a computer process, a failure signature of the detected failure to an archived failure signature contained in a database to determine if the archived failure signature matches the failure signature of the detected failure;
if the archived failure signature matches the failure signature of the detected failure, applying, in a computer process, an archived solution to the computer system that resolves the detected failure, the archived solution corresponding to a solution used to resolve a previously detected computer system failure corresponding to the archived failure signature in the database that matches the detected failure;
wherein the archived failure signature is based on a set of broken computer system invariants, the set of broken computer system invariants corresponding to the previously detected computer system failure;
further comprising constructing the database in a computer process prior to comparing the failure signature of the detected failure to the archived failure signature;
wherein the constructing the database includes extracting invariants from the computer system;
wherein the extracting the invariants includes;
modeling invariants of the computer system;
evaluating each of the invariants to determine whether it is broken;
counting the broken invariants to determine whether the number of the broken invariants meets a predetermined threshold number;
if the number of the broken invariants meets the predetermined threshold number deeming this result the previously detected computer system failure; and
combining the broken invariants into the set of broken invariants forming the archived failure signature of the previously detected computer system failure.
3 Assignments
0 Petitions
Accused Products
Abstract
A method system for diagnosing a detected failure in a computer system, compares a failure signature of the detected failure to an archived failure signature contained in a database to determine if the archived failure signature matches the failure signature of the detected failure. If the archived failure signature matches the failure signature of the detected failure, an archived solution is applied to the computer system that resolves the detected failure, the archived solution corresponding to a solution used to resolve a previously detected computer system failure corresponding to the archived failure signature in the database that matches the detected failure.
41 Citations
16 Claims
-
1. A method for diagnosing a detected failure in a computer system, the method comprising:
-
comparing, in a computer process, a failure signature of the detected failure to an archived failure signature contained in a database to determine if the archived failure signature matches the failure signature of the detected failure; if the archived failure signature matches the failure signature of the detected failure, applying, in a computer process, an archived solution to the computer system that resolves the detected failure, the archived solution corresponding to a solution used to resolve a previously detected computer system failure corresponding to the archived failure signature in the database that matches the detected failure; wherein the archived failure signature is based on a set of broken computer system invariants, the set of broken computer system invariants corresponding to the previously detected computer system failure; further comprising constructing the database in a computer process prior to comparing the failure signature of the detected failure to the archived failure signature; wherein the constructing the database includes extracting invariants from the computer system; wherein the extracting the invariants includes; modeling invariants of the computer system; evaluating each of the invariants to determine whether it is broken; counting the broken invariants to determine whether the number of the broken invariants meets a predetermined threshold number; if the number of the broken invariants meets the predetermined threshold number deeming this result the previously detected computer system failure; and combining the broken invariants into the set of broken invariants forming the archived failure signature of the previously detected computer system failure. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for diagnosing a detected failure in a computer system, the system comprising:
-
a database containing an archived failure signature; and a processor associated with the database, the processor executing instructions for; comparing a failure signature of the detected failure to the archived failure signature contained in the database to determine if the archived failure signature matches the failure signature of the detected failure; if the archived failure signature matches the failure signature of the detected failure, applying an archived solution to the computer system that resolves the detected failure, the archived solution corresponding to a solution used to resolve a previously detected computer system failure corresponding to the archived failure signature in the database that matches the detected failure; wherein the processor executes further instructions for extracting invariants from the computer system prior to the comparing of the failure signature of the detected failure to the archived failure signature contained in the database, in order to determine the failure signature of the detected failure wherein the extracting the invariants includes; modeling invariants of the computer system; evaluating each of the invariants to determine whether it is broken; counting the broken invariants to determine whether the number of the broken invariants meets a predetermined threshold number; if the number of the broken invariants meets the predetermined threshold number deeming this result the detected failure in the a computer system; and combining the broken invariants into a set of broken computer system invariants, the set of broken invariants forming the failure signature of the detected failure in the computer system. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A method for diagnosing a detected failure in a computer system, the method comprising:
-
comparing, in a computer process, a failure signature of the detected failure to an archived failure signature contained in a database to determine if the archived failure signature matches the failure signature of the detected failure; if the archived failure signature matches the failure signature of the detected failure, applying, in a computer process, an archived solution to the computer system that resolves the detected failure, the archived solution corresponding to a solution used to resolve a previously detected computer system failure corresponding to the archived failure signature in the database that matches the detected failure; wherein the failure signature of the detected failure is determined by extracting invariants from the computer system prior to the comparing of the failure signature of the detected failure to the archived failure signature contained in a database; wherein the extracting the invariants includes; modeling invariants of the computer system; evaluating each of the invariants to determine whether it is broken; counting the broken invariants to determine whether the number of the broken invariants meets a predetermined threshold number; if the number of the broken invariants meets the predetermined threshold number deeming this result the detected failure in the a computer system; and combining the broken invariants into a set of broken computer system invariants, the set of broken invariants forming the failure signature of the detected failure in the computer system.
-
Specification