Repair-policy refinement in distributed systems
First Claim
1. A method comprising:
- monitoring a plurality of devices using a plurality of sensors, wherein the plurality of sensors generate a plurality of signals and events for each device based on the monitoring;
determining a state of each of the plurality of devices based on the generated plurality of signals and events by a repair service;
performing one or more repair actions on one or more of the plurality of devices according to the determined states by the repair service;
recording, for each device, the state of the device, a time the device entered the state, any repair actions performed on the device by the repair service, and the generated plurality of signals and events for the device;
determining an effectiveness of the one or more repair actions using the recorded states, the performed repair actions, and the time the devices entered each state; and
determining the effectiveness of the one or more sensors using the determined effectiveness of the one or more repair actions and the generated plurality of signals and events of the one or more sensors using a formal language to represent a sequence of the signals and events.
2 Assignments
0 Petitions
Accused Products
Abstract
In a distributed system a plurality of devices (including computing units, storage and communication units) are monitored by an automated repair service that uses sensors and performs one or more repair actions on computing devices that are found to fail according to repair policies. The repair actions include automated repair actions and non-automated repair actions. The health of the computing devices is recorded in the form of states along with the repair actions that were performed on the computing devices and the times at which the repair actions were performed, and events generated by both sensors and the devices themselves. After some period of the time, the history of states of each device, the events, and the repair actions performed on the computing devices are analyzed to determine the effectiveness of the repair actions. A statistical analysis is performed based on the cost of each repair action and the determined effectiveness of each repair action, and one or more of the policies may be adjusted, as well as determining from the signals and events from the sensors whether the sensors themselves require adjustment.
-
Citations
20 Claims
-
1. A method comprising:
-
monitoring a plurality of devices using a plurality of sensors, wherein the plurality of sensors generate a plurality of signals and events for each device based on the monitoring; determining a state of each of the plurality of devices based on the generated plurality of signals and events by a repair service; performing one or more repair actions on one or more of the plurality of devices according to the determined states by the repair service; recording, for each device, the state of the device, a time the device entered the state, any repair actions performed on the device by the repair service, and the generated plurality of signals and events for the device; determining an effectiveness of the one or more repair actions using the recorded states, the performed repair actions, and the time the devices entered each state; and determining the effectiveness of the one or more sensors using the determined effectiveness of the one or more repair actions and the generated plurality of signals and events of the one or more sensors using a formal language to represent a sequence of the signals and events. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method comprising:
-
receiving a sequence of states for each of a plurality of devices by a repair service, wherein the sequence of states for a device comprises an indicator of each state the device entered into and an associated time, an indicator of each repair action performed on the device, and the events and signals generated by one or more sensors or devices that led to the repair action; determining the effectiveness of the repair actions based on the sequence of states and an amount of time each state that had a repair action performed remained in a healthy state by the repair service; receiving one or more policies by the repair service, wherein the policies indicate repair actions to perform on the devices based on the states of the devices and the events and signals generated by the one or more sensors; and determining the effectiveness of the one or more policies by the repair service based on the determined effectiveness of the repair actions and the events and signals generated by one or more sensors that led to the repair actions from the sequence of states using a formal language to represent a sequence of the signals and events. - View Dependent Claims (11, 12, 13)
-
-
14. A system comprising:
-
a plurality of devices; a plurality of sensors monitoring the plurality of devices and generating a plurality of signals and events for each device based on the monitoring; and a repair service adapted to; determine a state of each of the devices based on the plurality of signals and events; perform one or more repair actions on one or more of the devices according to the determined states; record, for each device, the state of the device, a time the device entered the state, any repair actions performed on the device by the repair service, and the generated plurality of signals and events for the device; determine the effectiveness of one or more repair actions using the recorded states, the performed repair actions, and the time the devices entered each state; and determine the effectiveness of the one or more sensors using the determined effectiveness of the one or more repair actions and the generated plurality of signals and events of the plurality of sensors using a formal language to represent a sequence of the plurality of signals and events. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification