Proactive Failure Recovery Model for Distributed Computing
First Claim
1. A computer-implemented method, comprising:
- building a virtual tree-like computing structure of a plurality of computing nodes;
for each computing node of the virtual tree-like computing structure, performing, by a hardware processor, a node failure prediction model to calculate a mean time between failure (MTBF) associated with the computing node;
determining whether to perform a checkpoint of the computing node based on a comparison between the calculated MTBF and a maximum and minimum threshold;
migrating a process from the computing node to a different computing node acting as a recovery node; and
resuming execution of the process on the different computing node.
1 Assignment
0 Petitions
Accused Products
Abstract
This disclosure generally describes methods and systems, including computer-implemented methods, computer-program products, and computer systems, for providing a proactive failure recovery model for distributed computing. One computer-implemented method includes building a virtual tree-like computing structure of a plurality of computing nodes, for each computing node of the virtual tree-like computing structure, performing, by a hardware processor, a node failure prediction model to calculate a mean time between failure (MTBF) associated with the computing node, determining whether to perform a checkpoint of the computing node based on a comparison between the calculated MTBF and a maximum and minimum threshold, migrating a process from the computing node to a different computing node acting as a recovery node, and resuming execution of the process on the different computing node.
34 Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
building a virtual tree-like computing structure of a plurality of computing nodes; for each computing node of the virtual tree-like computing structure, performing, by a hardware processor, a node failure prediction model to calculate a mean time between failure (MTBF) associated with the computing node; determining whether to perform a checkpoint of the computing node based on a comparison between the calculated MTBF and a maximum and minimum threshold; migrating a process from the computing node to a different computing node acting as a recovery node; and resuming execution of the process on the different computing node. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory, computer-readable medium storing computer-readable instructions, the instructions executable by a computer and configured to:
-
build a virtual tree-like computing structure of a plurality of computing nodes; for each computing node of the virtual tree-like computing structure, perform a node failure prediction model to calculate a mean time between failure (MTBF) associated with the computing node; determine whether to perform a checkpoint of the computing node based on a comparison between the calculated MTBF and a maximum and minimum threshold; migrate a process from the computing node to a different computing node acting as a recovery node; and resume execution of the process on the different computing node. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer system, comprising:
at least one hardware processor interoperably coupled with a memory storage and configured to; build a virtual tree-like computing structure of a plurality of computing nodes; for each computing node of the virtual tree-like computing structure, perform a node failure prediction model to calculate a mean time between failure (MTBF) associated with the computing node; determine whether to perform a checkpoint of the computing node based on a comparison between the calculated MTBF and a maximum and minimum threshold; migrate a process from the computing node to a different computing node acting as a recovery node; and resume execution of the process on the different computing node. - View Dependent Claims (16, 17, 18, 19, 20)
Specification