×

Class discovery for automated discovery, attribution, analysis, and risk assessment of security threats

  • US 8,418,249 B1
  • Filed: 11/10/2011
  • Issued: 04/09/2013
  • Est. Priority Date: 11/10/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method for profiling network traffic of a network, comprising:

  • obtaining a training dataset having n entries each comprising a plurality of feature values and a ground truth class label, wherein the plurality of feature values correspond to a plurality of features of a historical flow in the network traffic, wherein the historical flow is tagged with the ground truth class label based on data characteristics associated with a corresponding application executing in the network;

    constructing a ground truth kernel in a n×

    n matrix format by self multiplication of a ground truth class label vector, wherein the ground truth class label vector comprises n ground truth class labels each from one of the n entries in the training dataset;

    generating n initial boosting weights each corresponding to one of the n entries in the training dataset, wherein each of the n initial boosting weights represents estimated importance of a corresponding one of the n entries;

    generating, by a processor of a computer system, a first decision tree from the training dataset based on a decision tree learning algorithm using the n initial boosting weights, wherein the first decision tree maps each entry of the training dataset to a corresponding one in n first predicted class labels based on the plurality of feature values in the each entry, wherein a first predicted class label vector comprises the n first predicted class labels mapped by the first decision tree to the n entries in the training dataset;

    adjusting the n initial boosting weights to generate n adjusted boosting weights by comparing corresponding matrix elements between the ground truth kernel and a first kernel constructed by self multiplication of the first predicted class label vector, wherein a first matrix element mismatch increases the importance of the corresponding one of the n entries where the first matrix element mismatch occurs;

    generating, by the processor, a second decision tree from the training dataset based on the decision tree learning algorithm using the n adjusted boosting weights, wherein the second decision tree maps the each entry of the training dataset to a second predicted class label based on the plurality of feature values in the each entry, wherein a second predicted class label vector comprises n second predicted class labels mapped by the second decision tree to the n entries in the training dataset;

    generating, by the processor, a behavioral model based at least on the first predicted class label vector and the second predicted class label vector; and

    determining a class label for a new flow in the network traffic based on whether the new flow matches the behavioral model.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×