Enriching netflow data with passive DNS data for botnet detection
First Claim
1. A system comprising:
- a processor;
memory; and
a botnet detection application that is stored in the memory and executed by the processor and that is configured to;
obtain Netflow data indicating one or more IP addresses accessed by a computer;
obtain passive Domain Name System (DNS) data indicating respective one or more domains associated with each of the one or more IP addresses;
generate features associated with the computer based on the Netflow data and passive DNS data;
generate probability data based on the Netflow data and passive DNS data, wherein the probability data indicates a probability that the computer accessed the one or more domains, and wherein, in one or more instances, the probability is determined using a computed probability distribution over the one or more IP addresses and/or the one or more domains;
assign weights to the features based on the probability data to provide weighted features; and
determine whether the computer is likely to be part of a botnet based on the weighted features.
1 Assignment
0 Petitions
Accused Products
Abstract
In one example, a system includes a processor, memory, and a botnet detection application stored in memory and executed by the processor and configured to: obtain (i) Netflow data indicating one or more IP addresses accessed by a computer and (ii) passive Domain Name System (DNS) data indicating respective one or more domains associated with each of the one or more IP addresses; generate features associated with the computer based on the Netflow data and passive DNS data; generate probability data based on the Netflow data and passive DNS data, wherein the probability data indicates a probability that the computer accessed the one or more domains; assign weights to the features based on the probability data to provide weighted features; and determine whether the computer is likely to be part of a botnet based on the weighted features.
-
Citations
23 Claims
-
1. A system comprising:
-
a processor; memory; and a botnet detection application that is stored in the memory and executed by the processor and that is configured to; obtain Netflow data indicating one or more IP addresses accessed by a computer; obtain passive Domain Name System (DNS) data indicating respective one or more domains associated with each of the one or more IP addresses; generate features associated with the computer based on the Netflow data and passive DNS data; generate probability data based on the Netflow data and passive DNS data, wherein the probability data indicates a probability that the computer accessed the one or more domains, and wherein, in one or more instances, the probability is determined using a computed probability distribution over the one or more IP addresses and/or the one or more domains; assign weights to the features based on the probability data to provide weighted features; and determine whether the computer is likely to be part of a botnet based on the weighted features. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method comprising:
-
obtaining, from one or more routers, Netflow data indicating one or more IP addresses accessed by a computer; obtaining, from one or more Domain Name System (DNS) servers, passive DNS data indicating respective one or more domains associated with each of the one or more IP addresses; generating features associated with the computer based on the Netflow data and passive DNS data, wherein the features include one or more of; an indication that the computer has accessed rare IP addresses or domains; an indication that the computer has accessed domains having less than or equal to a predetermined age; an indication that the computer has received a number of NX domain responses to DNS queries exceeding a predetermined threshold; an indication that the computer has accessed a number of IP addresses or domains for a first time exceeding a predetermined threshold;
oran indication that the computer has accessed an IP address or domain associated with an idiosyncratic score exceeding a predetermined threshold; generating probability data based on the Netflow data and passive DNS data, wherein the probability data indicates a probability that the computer accessed the one or more domains; assigning weights to the features based on the probability data to provide weighted features; training a supervised machine learning algorithm using historical data and known labels associated with a plurality of different computers; and determining whether the computer is likely to be part of a botnet by applying the supervised machine learning algorithm to the weighted features. - View Dependent Claims (17, 18, 19)
-
-
20. A system comprising:
-
a processor; memory; a botnet detection application that is stored in the memory and executed by the processor and that is configured to; obtain, from one or more routers, Netflow data indicating one or more IP addresses accessed by a computer; obtain, from one or more Domain Name System (DNS) servers, passive DNS data indicating respective one or more domains associated with each of the one or more IP addresses; generate features associated with the computer based on the Netflow data and passive DNS data, wherein the features include one or more of; an indication that the computer has accessed rare IP addresses or domains; an indication that the computer has accessed domains having less than or equal to a predetermined age; an indication that the computer has received a number of NX domain responses to DNS queries exceeding a predetermined threshold; an indication that the computer has accessed a number of IP addresses or domains for a first time exceeding a predetermined threshold;
oran indication that the computer has accessed an IP address or domain associated with an idiosyncratic score exceeding a predetermined threshold; generate probability data based on the Netflow data and passive DNS data, wherein the probability data indicates a probability that the computer accessed the one or more domains; assign weights to the features based on the probability data to provide weighted features; train a supervised machine learning algorithm using historical data and known labels associated with a plurality of different computers; and determine whether the computer is likely to be part of a botnet by applying the supervised machine learning algorithm to the weighted features. - View Dependent Claims (21, 22, 23)
-
Specification