Proactive failure handling in data processing systems
First Claim
1. A computer system for predicting the health of a plurality of nodes by using health report data, the computer system comprising one or more processors executing computer executable instructions which cause the computer system to perform the following:
- monitors one or more health indicators for a plurality of nodes;
accesses one or more stored health indicators that provide a health history for one or more of the monitored plurality of nodes;
based on both the monitored health indicators and the stored health history, predicts a health status, wherein the predicted health status indicates for at least one of the monitored plurality of nodes that the at least one monitored node will be healthy or unhealthy in the future; and
presents the predicted health status to a specified entity.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments are directed to predicting the health of a computer node using health report data and to proactively handling failures in computer network nodes. In an embodiment, a computer system monitors various health indicators for multiple nodes in a computer network. The computer system accesses stored health indicators that provide a health history for the computer network nodes. The computer system then generates a health status based on the monitored health indicators and the health history. The generated health status indicates the likelihood that the node will be healthy within a specified future time period. The computer system then leverages the generated health status to handle current or predicted failures. The computer system also presents the generated health status to a user or other entity.
-
Citations
29 Claims
-
1. A computer system for predicting the health of a plurality of nodes by using health report data, the computer system comprising one or more processors executing computer executable instructions which cause the computer system to perform the following:
-
monitors one or more health indicators for a plurality of nodes; accesses one or more stored health indicators that provide a health history for one or more of the monitored plurality of nodes; based on both the monitored health indicators and the stored health history, predicts a health status, wherein the predicted health status indicates for at least one of the monitored plurality of nodes that the at least one monitored node will be healthy or unhealthy in the future; and presents the predicted health status to a specified entity. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer-implemented method for proactively handling failures in a distributed processing system comprising a plurality of nodes, the computer-implemented method being performed by one or more processors executing computer executable instructions for the computer-implemented method, and the computer-implemented method comprising:
-
monitoring one or more health indicators for a plurality of nodes of a distributed processing system; accessing one or more stored health indicators that provide a health history for the one or more of the monitored nodes of the distributed processing system; predicting a health status based on the monitored health indicators and the health history, wherein the predicted health status indicates that at least one of the one or more monitored nodes of the distributed processing system will be healthy or unhealthy in the future; determining, for at least one of the monitored nodes of the distributed processing system, that a threshold number of failures have occurred that are beyond a specified error level; based on the determination, blacklisting the at least one monitored node for which the determination was made; transferring one or more portions of data stored from the at least one monitored node that is blacklisted to one or more of other nodes of the distributed processing system; and preventing the node that was blacklisted from storing new data. - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
-
23. A system comprising:
-
a processing system comprised of a plurality of nodes; one or more computer-readable storage hardware media, excluding transmission media, having stored thereon computer-executable instructions that, when executed by one or more processors, cause the system to be configured with an architecture that proactively handles failures in the plurality of nodes by using health indicators, and wherein the architecture is configured to perform the following; monitor one or more health indicators for a plurality of nodes of a processing system; access one or more stored health indicators that provide a health history for one or more of the monitored plurality of nodes; based on both the monitored health indicators and the stored health history, predict a health status, wherein the predicted health status indicates for at least one of the monitored plurality of nodes that the at least one monitored node will be healthy or unhealthy in the future; and present the predicted health status to a specified entity. - View Dependent Claims (24, 25, 26, 27, 28, 29)
-
Specification