×

Machine learning based botnet detection using real-time connectivity graph based traffic features

  • US 8,762,298 B1
  • Filed: 01/05/2011
  • Issued: 06/24/2014
  • Est. Priority Date: 01/05/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method for identifying a botnet in a network, comprising:

  • obtaining historical network data in the network, the historical network data comprising a first plurality of data units;

    analyzing, by a central processing unit (CPU) of a computer and using a pre-determined heuristic, the historical network data to determine a plurality of values of a connectivity graph based feature for the first plurality of data units, wherein a first value of the connectivity graph based feature for a first data unit of the first plurality of data units is determined based on and representing connectivity characteristics of at least a portion of the historical network data associated with the first data unit;

    obtaining a ground truth data set associated with the historical network data, the ground truth data set comprising a plurality of labels with each label assigned to a corresponding data unit of the first plurality of data units, said each label comprising one of a first label categorizing said corresponding data unit as associated with the botnet and a second label categorizing said corresponding data unit as not associated with the botnet;

    analyzing, by the CPU and using a machine learning algorithm, the historical network data and the ground truth data set to generate a model comprising statistical predictions of the plurality of labels as a function of the plurality of values of the connectivity graph based feature with respect to the first plurality of data units;

    obtaining real-time network data in the network, the real-time network data comprising a second plurality of data units;

    analyzing, by the CPU and using the pre-determined heuristic, the real-time network data to determine a second value of the connectivity graph based feature for a second data unit of the second plurality of data units, wherein the second value is determined based on and representing connectivity characteristics of at least a portion of the real-time network data associated with the second data unit;

    assigning a third label to the second data unit by applying the model to the second value of the connectivity graph based feature; and

    categorizing the second data unit as associated with the botnet based on the third label,wherein the first plurality of data units comprise a plurality of IP (Internet Protocol) addresses,wherein analyzing the historical network data using the pre-determined heuristic comprises;

    constructing a graph comprising nodes representing the plurality of IP addresses and edges each representing communication within the historical network data between two of the plurality of IP addresses;

    analyzing the graph to determine at least a portion of the plurality of values of the connectivity graph based feature corresponding to the plurality of IP addresses;

    identifying botnet nodes among the nodes according to the ground truth data set; and

    determining an anti-trust rank that is calculated using a page rank algorithm and at least one edge weight assigned to at least one of the edges, wherein the at least one edge weight is determined based on whether the at least one of the edges is related to any of the botnet nodes, andwherein the connectivity graph based feature represents connectivity characteristics associated with the nodes and comprises the anti-trust rank.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×