Behavioral modeling of a data center utilizing human knowledge to enhance a machine learning algorithm
First Claim
1. A method of a server, comprising:
- grouping metrics of a data center collected through one or more sensors by a plurality of nodes in the data center;
generating a behavioral model of the data center when a machine learning algorithm is applied using a processor and a memory of the server,wherein the behavioral model is structured based on an analysis of a team of human modelers that partition the data center into the plurality of nodes as a plurality of connected nodes, each node in the plurality of connected nodes representing an active electronic device attached to a computer network to which the server integrates by way of a machine learning environment, and the active electronic device being capable of sending, receiving, and forwarding information over a communication channel of the computer network,wherein the each node is further decomposed by the team of human modelers into a connected set comprising at least one of a child node and a simple component,wherein the child node is a node that is a subset of another node, andwherein the simple component is a node that has not been further decomposed;
detecting an anomaly in a system behavior using the behavioral model of the data center by recursively applying, through the processor and the memory, the behavioral model to the each node and the simple component by;
generating a compressed metric vector for the each node by reducing a dimension of an input metric vector using at least one of;
a principal component analysis and a neutral network, wherein the input metric vector comprises at least one of a metric for the each node and the compressed metric vector from the child node, and the input metric vector represents a multidimensional space in which a software component comprising a representation of the each node is defined with distinct coordinates; and
determining whether anomalous behavior is occurring in the each node by comparing the compressed metric vector with a compressed model vector,wherein the compressed model vector of the each node is the compressed metric vector generated using at least one of the metric associated with the each node operating non-anomalously, the metric being a property of a route in the computer network capable of being any value used by a routing protocol to determine whether one particular route is preferable to another route;
determining a root cause of a failure caused by the detected anomaly, the root cause of the failure being an initiating cause of a causal chain leading to the detected anomaly;
proactively updating the behavioral model of the data center using the machine learning algorithm and an automatic recommendation of an action by an operator to resolve a problem caused by the failure; and
automatically updating a system model of the data center based on combining behavioral models for the plurality of connected nodes, and detection of a dynamic change from at least one of a creation, a destruction, and a modification of at least one of an interconnection and a flow in the data center based on a reapplication of a human knowledge to further enhance the machine learning algorithm, the interconnection referring to at least one of a modification, an adjustment and an alteration in a connection of the each node to attain a target result, and the flow referring to a pattern of processing an input to the system model to achieve the target result based on the behavioral model of the machine learning environment.
4 Assignments
0 Petitions
Accused Products
Abstract
A method generates a behavioral model of a data center when a machine learning algorithm is applied. A team of human modelers that partition the data center into a plurality of connected nodes is analyzed by a behavioral model. The behavioral model of the data center detects an anomaly in a system behavior center by recursively applying the behavioral model to each node and simple component. A compressed metric vector for the node is generated by reducing a dimension of an input metric vector. A root cause of a failure caused is determined by the anomaly and an action is automatically recommended to an operator to resolve a problem caused by the failure. The proactively actions are taken to keep the data center in a normal state based on the behavioral model using the machine learning algorithm.
14 Citations
7 Claims
-
1. A method of a server, comprising:
-
grouping metrics of a data center collected through one or more sensors by a plurality of nodes in the data center; generating a behavioral model of the data center when a machine learning algorithm is applied using a processor and a memory of the server, wherein the behavioral model is structured based on an analysis of a team of human modelers that partition the data center into the plurality of nodes as a plurality of connected nodes, each node in the plurality of connected nodes representing an active electronic device attached to a computer network to which the server integrates by way of a machine learning environment, and the active electronic device being capable of sending, receiving, and forwarding information over a communication channel of the computer network, wherein the each node is further decomposed by the team of human modelers into a connected set comprising at least one of a child node and a simple component, wherein the child node is a node that is a subset of another node, and wherein the simple component is a node that has not been further decomposed; detecting an anomaly in a system behavior using the behavioral model of the data center by recursively applying, through the processor and the memory, the behavioral model to the each node and the simple component by; generating a compressed metric vector for the each node by reducing a dimension of an input metric vector using at least one of;
a principal component analysis and a neutral network, wherein the input metric vector comprises at least one of a metric for the each node and the compressed metric vector from the child node, and the input metric vector represents a multidimensional space in which a software component comprising a representation of the each node is defined with distinct coordinates; anddetermining whether anomalous behavior is occurring in the each node by comparing the compressed metric vector with a compressed model vector, wherein the compressed model vector of the each node is the compressed metric vector generated using at least one of the metric associated with the each node operating non-anomalously, the metric being a property of a route in the computer network capable of being any value used by a routing protocol to determine whether one particular route is preferable to another route; determining a root cause of a failure caused by the detected anomaly, the root cause of the failure being an initiating cause of a causal chain leading to the detected anomaly; proactively updating the behavioral model of the data center using the machine learning algorithm and an automatic recommendation of an action by an operator to resolve a problem caused by the failure; and automatically updating a system model of the data center based on combining behavioral models for the plurality of connected nodes, and detection of a dynamic change from at least one of a creation, a destruction, and a modification of at least one of an interconnection and a flow in the data center based on a reapplication of a human knowledge to further enhance the machine learning algorithm, the interconnection referring to at least one of a modification, an adjustment and an alteration in a connection of the each node to attain a target result, and the flow referring to a pattern of processing an input to the system model to achieve the target result based on the behavioral model of the machine learning environment. - View Dependent Claims (2, 3, 4)
-
-
5. A method comprising:
-
grouping metrics of a data center collected through one or more sensors by subsystems of the data center; generating a behavioral model of the data center when a machine learning algorithm is applied using a processor and a memory, wherein the behavioral model is trained based on a human knowledge deconstruction of the data center into a set of connected simplified components, wherein the behavioral model is generated based on an analysis of a team of human modelers that decomposes a system of the data center into the subsystems as a connected system of smaller constituent subsystems, a smaller constituent subsystem representing an active electronic device attached to a computer network to which the processor and the memory integrate by way of a machine learning environment, and the active electronic device being capable of sending, receiving and forwarding information over a communication channel of the computer network, wherein the smaller constituent subsystems are further decomposed by the team of human modelers into the set of connected simplified components, wherein the team of human modelers identify at least one characteristic comprising a label, a type, a category, a connection, and a metric of each of the smaller constituent subsystems, the metric being a property of a route in the computer network capable of being any value used by a routing protocol to determine whether one particular route is preferable to another route, wherein the team of human modelers groups the each of the smaller constituent subsystems having similar characteristics to enable the machine learning algorithm to learn a system behavior, and the system behavior being a set of parameters monitored based on the behavioral model, and wherein the machine learning algorithm continually improves the behavioral model based on a human knowledge applied in real time as an input by the team of human modelers; detecting an anomaly in the system behavior based on the behavioral model of the data center; determining a root cause of a failure caused by the detected anomaly, the root cause of the failure being an initiating cause of a causal chain leading to the detected anomaly; compressing the metric of the each of the smaller constituent subsystems in a recursive fashion to ultimately build a system model of the data center at a point in time; proactively updating the behavioral model of the data center using the machine learning algorithm and an automatic recommendation of an action by an operator to resolve a problem caused by the failure; and automatically updating the system model of the data center based on detection of a dynamic change from at least one of a creation, a destruction and a modification of at least one of an interconnection and a flow in the data center based on a reapplication of the human knowledge to further enhance the machine learning algorithm, the interconnection referring to at least one of a modification, an adjustment and an alteration in a connection of the smaller constituent subsystem to attain a target result, and the flow referring to a pattern of processing an input to the system model to achieve the target result based on the behavioral model of the machine learning environment. - View Dependent Claims (6)
-
-
7. A system of a machine learning environment comprising:
-
a computer server of the machine learning environment, the computer server including one or more computers having instructions stored thereon that when executed cause the one or more computers to; group metrics of a data center collected through one or more sensors by subsystems of the data center; generate a behavioral model of the data center when a machine learning algorithm is applied using a processor and a memory, wherein the behavioral model is generated based on analysis of a team of human modelers that decomposes a system of the data center into a connected system of smaller constituent subsystems, a smaller constituent subsystem representing an active electronic device attached to a computer network to which the processor and the memory integrate by way of the machine learning environment, and the active electronic device being capable of sending, receiving and forwarding information over a communication channel of the computer network, wherein the smaller constituent subsystems are further decomposed by the team of human modelers into a set of connected simplified components, wherein the behavioral model is trained based on a human knowledge deconstruction of the data center into the set of connected simplified components, wherein the team of human modelers identifies at least one characteristic comprising a label, a type, a category, a connection, and a metric of each of the smaller constituent subsystems, the metric being a property of a route in the computer network capable of being any value used by a routing protocol to determine whether one particular route is preferable to another route, wherein the team of human modelers groups the each of the smaller constituent subsystems having a similar characteristics to enable the machine learning algorithm to learn a system behavior, and the system behavior being a set of parameters monitored based on the behavioral model, and wherein the machine learning algorithm continually improves the behavioral model based on a human knowledge applied in real time as an input by the team of human modelers, detect an anomaly in the system behavior based on the behavioral model of the data center, determine a root cause of a failure caused by the detected anomaly, the root cause of the failure being an initiating cause of a causal chain leading to the detected anomaly, proactively update the behavioral model of the data center using the machine learning algorithm and an automatic recommendation of an action by an operator to resolve a problem caused by the failure, compress the metric of the each of the smaller constituent subsystems in a recursive fashion to ultimately build a system model of the data center at a point in time, and automatically update the system model of the data center based on detection of a dynamic change from at least one of a creation, a destruction, and a modification of at least one of an interconnection and a flow in the data center based on a reapplication of the human knowledge to further enhance the machine learning algorithm, the interconnection referring to at least one of a modification, an adjustment and an alteration in a connection of the smaller constituent subsystem to attain a target result, and the flow referring to a pattern of processing an input to the system model to achieve the target result based on the behavioral model of the machine learning environment.
-
Specification