×

Real-time categorization of log events

  • US 9,678,822 B2
  • Filed: 03/17/2015
  • Issued: 06/13/2017
  • Est. Priority Date: 01/02/2015
  • Status: Active Grant
First Claim
Patent Images

1. A method for categorizing a real-time log event, the method comprising:

  • computing a Term Frequency-Inverse Document Frequency (TF-IDF) matrix of a log corpus based on a number of pre-existing log events in the log corpus and a number of words in the log corpus;

    computing a Term Frequency-Inverse Document Frequency (TF-IDF) vector for the real-time log event based on a pre-calculated TF-IDF matrix of the log corpus and a number of new words in the real-time log event, wherein the log corpus comprises one or more pre-existing log events, and wherein the real-time log event is indicative of an error message;

    generating a cluster model based on the TF-IDF matrix, wherein the cluster model is indicative of a number of clusters corresponding to the log corpus, and wherein a cluster is indicative of a log category;

    determining a centroid matrix of the log corpus based on the number of clusters in the cluster model and the number of words in the log corpus;

    calculating a cluster radius and a silhouette width of each cluster, wherein the cluster radius of a cluster is calculated based on a distance between a cluster centroid of the cluster and a farthest point in the cluster; and

    wherein the silhouette width of the cluster is indicative of compactness of the cluster;

    determining a silhouette threshold for each cluster based on the corresponding cluster radius and the corresponding silhouette width;

    calculating a distance between the TF-IDF vector and the cluster centroid of each cluster in the log corpus;

    identifying, from amongst the clusters, a cluster having a closest cluster centroid based on the distance between the TF-IDF vector and the cluster centroid of each of the clusters, wherein the closest cluster centroid is a cluster centroid closest to the TF-IDF vector; and

    categorizing the real-time log event into one or more log categories based on a comparison of the distance between the TF-IDF vector and the closest cluster centroid with a pre-determined silhouette threshold corresponding to the cluster with the closest cluster centroid.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×