Cluster performance monitoring

US 9,043,332 B2
Filed: 10/25/2012
Issued: 05/26/2015
Est. Priority Date: 09/07/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving machine data from a computing cluster, the computing cluster including a plurality of computational cluster nodes coordinating in operation;

generating time stamped events from the received machine data, each time stamped event having a time stamp derived from time stamp data parsed from the received machine data;

analyzing, for each computational cluster node of the plurality of computational cluster nodes, a metric characterizing an aspect of computational performance of the computational cluster node, wherein the metric is analyzed based on values included in a set of time stamped events;

computing an event pattern using analyzed metrics;

monitoring whether the event pattern is indicative of a previously determined or known problem for operation of the computing cluster using a heuristic analysis; and

generating a notification when the event pattern is indicative of a previously determined or known problem for operation of the computing cluster.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments are directed towards the visualization of machine data received from computing clusters. Embodiments may enable improved analysis of computing cluster performance, error detection, troubleshooting, error prediction, or the like. Individual cluster nodes may generate machine data that includes information and data regarding the operation and status of the cluster node. The machine data is received from each cluster node for indexing by one or more indexing applications. The indexed machine data including the complete data set may be stored in one or more index stores. A visualization application enables a user to select one or more analysis lenses that may be used to generate visualizations of the machine data. The visualization application employs the analysis lens to produce visualizations of the computing cluster machine data.

Citations

24 Claims

1. A computer-implemented method comprising:
- receiving machine data from a computing cluster, the computing cluster including a plurality of computational cluster nodes coordinating in operation;
  
  generating time stamped events from the received machine data, each time stamped event having a time stamp derived from time stamp data parsed from the received machine data;
  
  analyzing, for each computational cluster node of the plurality of computational cluster nodes, a metric characterizing an aspect of computational performance of the computational cluster node, wherein the metric is analyzed based on values included in a set of time stamped events;
  
  computing an event pattern using analyzed metrics;
  
  monitoring whether the event pattern is indicative of a previously determined or known problem for operation of the computing cluster using a heuristic analysis; and
  
  generating a notification when the event pattern is indicative of a previously determined or known problem for operation of the computing cluster.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein monitoring whether the event pattern is indicative of a previously determined or known problem comprises comparing the event pattern with a predefined alert pattern.
  - 3. The method of claim 1, wherein the predefined alert pattern comprises a pattern identified by a user.
  - 4. The method of claim 1, further comprising:
    - indexing the received machine data;
      
      storing the indexed data; and
      
      retrieving select indexed data to determine the metric.
  - 5. The method of claim 1, wherein the event pattern comprises a pattern in a heat map, wherein the heat map represents, for each computational cluster node of the plurality of computational cluster nodes, the metric of the node.
  - 6. The method of claim 5, wherein determining that the event pattern is indicative of a previously determined or known problem comprises determining whether the heat map is expanding or contracting.
  - 7. The method of claim 1, further comprising, generating a visualization of cluster data when the event pattern is indicative of a previously determined or known problem for operation of the computing cluster, wherein the visualization includes a representation of each computational cluster node of the plurality of computational cluster nodes and the metric for each node.

8. A network device comprising:
- a device, implemented at least partially in hardware, that receives machine data from a computing cluster, the computing cluster including a plurality of computational cluster nodes coordinating in operation;
  
  a device, implemented at least partially in hardware, that generates time stamped events from the received machine data, each time stamped event having a time stamp derived from time stamp data parsed from the received machine data;
  
  a device, implemented at least partially in hardware, that analyzes, for each computational cluster node of the plurality of computational cluster nodes, a metric characterizing an aspect of computational performance of the computational cluster node, wherein the metric is analyzed based on values included in a set of time stamped events;
  
  a device, implemented at least partially in hardware, that computes an event pattern using analyzed metrics;
  
  a device, implemented at least partially in hardware, that monitors whether the event pattern is indicative of a previously determined or known problem for operation of the computing cluster using a heuristic analysis; and
  
  a device, implemented at least partially in hardware, that generates a notification when the event pattern is indicative of the previously determined or known problem for operation of the computing cluster.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The network device of claim 8, wherein the device that monitors whether the event pattern is indicative of a previously determined or known problem further comprises a device, implemented at least partially in hardware, that compares the event pattern with a predefined alert pattern.
  - 10. The network device of claim 8, wherein the predefined alert pattern comprises a pattern identified by a user.
  - 11. The network device of claim 8, further comprising:
    - a device, implemented at least partially in hardware, that indexes the received machine data;
      
      a device, implemented at least partially in hardware, that stores the indexed data; and
      
      a device, implemented at least partially in hardware, that retrieves select indexed data to determine the metric.
  - 12. The network device of claim 8, wherein the event pattern comprises a pattern in a heat map, wherein the heat map represents, for each computational cluster node of the plurality of computational cluster nodes, the metric of the node.
  - 13. The network device of claim 12, wherein the device that determines that the event pattern is indicative of a previously determined or known problem further comprises a device, implemented at least partially in hardware, that determines whether the heat map is expanding or contracting.
  - 14. The network device of claim 8, further comprising:
    - a device, implemented at least partially in hardware, that upon determining that the event pattern is indicative of the previously determined or known problem for operation of the computing cluster, generates a visualization of cluster data, wherein the visualization includes a representation of each computational cluster node of the plurality of computational cluster nodes and the metric for each node.

15. A non-transitive storage medium that includes a plurality of instructions, wherein execution of at least a portion of the instructions by a processor device enables a plurality of actions, the actions comprising:
- receiving machine data from a computing cluster, the computing cluster including a plurality of computational cluster nodes coordinating in operation;
  
  generating time stamped events from the received machine data, each time stamped event having a time stamp derived from time stamp data parsed from the received machine data;
  
  analyzing, for each computational cluster node of the plurality of computational cluster nodes, a metric characterizing an aspect of computational performance of the computational cluster node, wherein the metric is analyzed based on values included in a set of time stamped events;
  
  computing an event pattern using analyzed metrics;
  
  monitoring whether the event pattern is indicative of a previously determined or known problem for operation of the computing cluster using a heuristic analysis; and
  
  generating a notification when the event pattern is indicative of a previously determined or known problem for operation of the computing cluster.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The medium of claim 15, wherein monitoring whether the event pattern is indicative of a previously determined or known problem comprises comparing the event pattern with a predefined alert pattern.
  - 17. The medium of claim 15, wherein the predefined alert pattern comprises a pattern identified by a user.
  - 18. The medium of claim 15, wherein the actions further comprise:
    - indexing the received machine data;
      
      storing the indexed data; and
      
      retrieving select indexed data to determine the metric.
  - 19. The medium of claim 15, wherein the event pattern comprises a pattern in a heat map, wherein the heat map represents, for each computational cluster node of the plurality of computational cluster nodes, the metric of the node.
  - 20. The medium of claim 19, wherein determining that the event pattern is indicative of a previously determined or known problem comprises determining whether the heat map is expanding or contracting.
  - 21. The medium of claim 15, wherein the actions further comprise, generating a visualization of cluster data when the event pattern is indicative of a previously determined or known problem for operation of the computing cluster, wherein the visualization includes a representation of each computational cluster node of the plurality of computational cluster nodes and the metric for each node.

22. A system comprising:
- a plurality of nodes; and
  
  a network device, including;
  
  a memory device for storing instructions; and
  
  a processor device that executes at least a portion of the stored instructions to enable a plurality of actions, the actions including;
  
  receiving machine data from a computing cluster, the computing cluster including a plurality of computational cluster nodes coordinating in operation;
  
  generating time stamped events from the received machine data, each time stamped event having a time stamp derived from time stamp data parsed from the received machine data;
  
  analyzing, for each computational cluster node of the plurality of computational cluster nodes, a metric characterizing an aspect of computational performance of the computational cluster node, wherein the metric is analyzed based on values included in a set of time stamped events;
  
  computing an event pattern using analyzed metrics;
  
  monitoring whether the event pattern is indicative of a previously determined or known problem for operation of the computing cluster using a heuristic analysis; and
  
  generating a notification when the event pattern is indicative of the previously determined or known problem for operation of the computing cluster.
- View Dependent Claims (23, 24)
- - 23. The system of claim 22, wherein determining that the event pattern is indicative of a previously determined or known problem comprises comparing the event pattern with a predefined alert pattern.
  - 24. The system of claim 22, wherein the event pattern comprises a pattern in a heat map, wherein the heat map represents, for each computational cluster node of the plurality of computational cluster nodes, the metric of the node.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Splunk Inc. (Cisco Systems, Inc.)
Original Assignee
Splunk Inc. (Cisco Systems, Inc.)
Inventors
Noel, Cary Glen, Raitz, Alex, Tsai, Pierre, Pakkinsamy, Kirubakaran
Primary Examiner(s)
BROWN, SHEREE N

Application Number

US13/660,910
Publication Number

US 20140074850A1
Time in Patent Office

943 Days
Field of Search

707/741
US Class Current

707/741
CPC Class Codes

G06F 11/0709   in a distributed system con...

G06F 11/0712   in a virtual computing plat...

G06F 11/0721   within a central processing...

G06F 11/0751   Error or fault detection no...

G06F 11/0769   Readable error formats, e.g...

G06F 11/0787   Storage of error reports, e...

G06F 11/079   Root cause analysis, i.e. e...

G06F 9/542   Event management; Broadcast...

G06N 5/022   Knowledge engineering; Know...

G06N 5/048   Fuzzy inferencing

Cluster performance monitoring

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Cluster performance monitoring

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links