Fault tolerance framework for networks of nodes
First Claim
1. A system including instructions recorded on a non-transitory computer-readable storage medium and executable by at least one processor, the system comprising:
- the at least one processor;
a first message handler configured to cause the at least one processor to receive first network-related data using a first communications protocol, the first network-related data being associated with a first network of nodes, the nodes of the first network of nodes communicating with one another within the first network using the first communications protocol;
a second message handler configured to cause the at least one processor to receive second network-related data using a second communications protocol, the second network-related data being associated with a second network of nodes, the nodes of the second network of nodes communicating with one another within the second network using the second communications protocol;
a message transport system configured to cause the at least one processor to receive the first network-related data and the second network-related data and further configured to route the first network-related data and the second network-related data in a common communications protocol; and
a fault manager configured to;
construct and maintain a state model that stores events occurring in the first network of nodes and the second network of nodes, stores related state information, and stores network metadata for the first network of nodes and the second network of nodes,cause the at least one processor to receive the network-related data in the common communications protocol, anddetermine a fault associated with an operation of one or more of the first network of nodes and the second network of nodes, based on the network-related data in the common protocol and on the state model,cause the at least one processor to determine a recovery method for recovering an operation of the networks of nodes despite the fault, the recovery method being determined from among a plurality of recovery methods including one or more of triggering a workflow, performing a sensor value fusion, and deploying a service within a node of the networks of nodes; and
cause the at least one processor to implement the determined recovery method; and
a code distribution manager configured to determine a target node not associated with the fault within the first network of nodes and to determine a service executable for deploying a service in response to the fault and based on a mapping of the service to the target node as part of implementing the determined recovery method when the recovery method includes deploying a service; and
a service injector configured to deploy the service executable to the target node for continued execution thereon to thereby recover the operation, wherein the service injector is selected from a plurality of service injectors as being compatible with the first communications protocol.
2 Assignments
0 Petitions
Accused Products
Abstract
In some implementations, a first message handler may be configured to receive first network-related data associated with a first network of nodes, the first network of nodes using a first communications protocol. A second message handler may be configured to receive second network-related data associated with a second network of nodes, the second network of nodes using a second communications protocol. A message transport system may be configured to receive the first network-related data and the second network-related data and further configured to route the first network-related data and the second network-related data in a common protocol, and a fault manager may be configured to receive the network-related data in the common protocol and configured to determine a fault associated with an operation of one or more of the first network of nodes and the second network of nodes, based on the network-related data in the common protocol.
149 Citations
21 Claims
-
1. A system including instructions recorded on a non-transitory computer-readable storage medium and executable by at least one processor, the system comprising:
-
the at least one processor; a first message handler configured to cause the at least one processor to receive first network-related data using a first communications protocol, the first network-related data being associated with a first network of nodes, the nodes of the first network of nodes communicating with one another within the first network using the first communications protocol; a second message handler configured to cause the at least one processor to receive second network-related data using a second communications protocol, the second network-related data being associated with a second network of nodes, the nodes of the second network of nodes communicating with one another within the second network using the second communications protocol; a message transport system configured to cause the at least one processor to receive the first network-related data and the second network-related data and further configured to route the first network-related data and the second network-related data in a common communications protocol; and a fault manager configured to; construct and maintain a state model that stores events occurring in the first network of nodes and the second network of nodes, stores related state information, and stores network metadata for the first network of nodes and the second network of nodes, cause the at least one processor to receive the network-related data in the common communications protocol, and determine a fault associated with an operation of one or more of the first network of nodes and the second network of nodes, based on the network-related data in the common protocol and on the state model, cause the at least one processor to determine a recovery method for recovering an operation of the networks of nodes despite the fault, the recovery method being determined from among a plurality of recovery methods including one or more of triggering a workflow, performing a sensor value fusion, and deploying a service within a node of the networks of nodes; and cause the at least one processor to implement the determined recovery method; and a code distribution manager configured to determine a target node not associated with the fault within the first network of nodes and to determine a service executable for deploying a service in response to the fault and based on a mapping of the service to the target node as part of implementing the determined recovery method when the recovery method includes deploying a service; and a service injector configured to deploy the service executable to the target node for continued execution thereon to thereby recover the operation, wherein the service injector is selected from a plurality of service injectors as being compatible with the first communications protocol. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 16)
-
-
10. A system including instructions recorded on a non-transitory computer-readable storage medium and executable by at least one processor, the system comprising:
-
the at least one processor; a middleware layer configured to cause the at least one processor to communicate with a plurality of networks of nodes and configured to cause the at least one processor to communicate with at least one back-end application, the middleware layer including a platform abstraction layer configured to; cause the at least one processor to receive, from the plurality of networks of nodes, first network-related data using a first communications protocol from a first network of nodes and second network-related data using a second communications protocol from a second network of nodes, wherein the first communications protocol is used by the nodes of the first network for communicating with one another within the first network and the second communications protocol is used by the nodes of the second network for communicating with one another within the second network, and provide the first network-related data and the second network-related data in a common protocol; and a fault management layer configured to; cause the at least one processor to construct and maintain a state model that stores events occurring in the first network of nodes and the second network of nodes, stores related state information, and stores network metadata for the first network of nodes and the second network of nodes, cause the at least one processor to receive the first network-related data and the second network-related data in the common protocol, and cause the at least one processor to determine a fault associated with an operation of the plurality of networks, based on the network-related data in the common protocol and on the state model, cause the at least one processor to determine a recovery method for responding to the fault, and cause the at least one processor to notify the platform abstraction layer of the recovery method, wherein the platform abstraction layer is further configured to; cause the at least one processor to determine a target node not associated with the fault within the first network of nodes when the recovery method includes deploying a service, cause the at least one processor to select a service injector from a plurality of service injectors, the selected service injector being compatible with the first communications protocol, and cause the at least one processor to deploy the service, using the selected service injector, to the target node for execution thereon to thereby recover the fault. - View Dependent Claims (11, 12, 20)
-
-
13. A method comprising:
-
receiving network-related data associated with a plurality of networks of nodes at one of a plurality of message handlers, the plurality of message handlers each associated with a corresponding network of nodes and a corresponding communications protocol that is used by the corresponding network of nodes to conduct in-network communications within and among the corresponding nodes thereof; translating the network-related data from the corresponding communications protocol into a common communications protocol; providing the network-related data in the common communications protocol to a state model that stores state information related to the plurality of networks of nodes and network metadata for the plurality of networks of nodes; diagnosing a fault associated with an operation of the plurality of networks of nodes, based on the state model, the operation being associated with a source node of a first network of nodes within the plurality of networks; and recovering the fault by; determining a recovery method for recovering the operation despite the fault, determining a target node not associated with the fault within the first network of nodes when the recovery method includes deploying a service, selecting a service injector from a plurality of service injectors, the selected service injector being compatible with a communications protocol corresponding to the source node, and deploying, using the selected service injector, the service to the target node for execution on the target node to recover the fault. - View Dependent Claims (14, 15, 17, 18, 19, 21)
-
Specification