Robust representation of network traffic for detecting malware variations
First Claim
1. A method comprising:
- at a networking device, dividing network traffic records to create at least one group of network traffic records, the at least one group including network traffic records being associated with network communications between a computing device and a server for a predetermined period of time;
generating a set of feature vectors, each feature vector of the set of features vectors representing one of the network traffic records of the network communications included in the at least one group of network traffic records, wherein each feature vector comprises a predefined set of features extracted from one of the network traffic records;
computing a self-similarity matrix for each feature of the predefined set of features using all feature vectors generated for the at least one group, each self-similarity matrix being a representation of one feature of the predefined set of features that is invariant to an increase or a decrease of values of the one feature across all of the feature vectors generated for the at least one group of network traffic records, each self-similarity matrix including a plurality of elements in rows and columns, wherein an (i, j)-th element of a self-similarity matrix corresponds to a distance between a feature value of an i-th network traffic record and a feature value of a j-th network traffic record;
transforming each self-similarity matrix into a corresponding histogram to form a set of histograms, each histogram being a representation of the one feature that is invariant to a number of network traffic records in the at least one group of network traffic records;
generating a cumulative feature vector based on the set of histograms, the cumulative feature vector being a cumulative representation of the predefined set of features of all network traffic records included in the at least one group of network traffic records;
training a classifier based on the cumulative feature vector to produce a trained classifier;
classifying, by the trained classifier, the at least one group as being malicious; and
identifying a malware network communication between the computing device and the server utilizing the at least one classified group,wherein the cumulative feature vector enables detection of variations and modifications of the malware network communication.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are presented that identify malware network communications between a computing device and a server based on a cumulative feature vector generated from a group of network traffic records associated with communications between computing devices and servers. Feature vectors are generated, each vector including features extracted from the network traffic records in the group. A self-similarity matrix is computed for each feature which is a representation of the feature that is invariant to an increase or a decrease of feature values across all feature vectors in the group. Each self-similarity matrix is transformed into corresponding histograms to be invariant to a number of network traffic records in the group. The cumulative feature vector is a cumulative representation of the predefined set of features of all network traffic records included in the at least one group of network traffic records and is generated based on the corresponding histograms.
11 Citations
20 Claims
-
1. A method comprising:
-
at a networking device, dividing network traffic records to create at least one group of network traffic records, the at least one group including network traffic records being associated with network communications between a computing device and a server for a predetermined period of time; generating a set of feature vectors, each feature vector of the set of features vectors representing one of the network traffic records of the network communications included in the at least one group of network traffic records, wherein each feature vector comprises a predefined set of features extracted from one of the network traffic records; computing a self-similarity matrix for each feature of the predefined set of features using all feature vectors generated for the at least one group, each self-similarity matrix being a representation of one feature of the predefined set of features that is invariant to an increase or a decrease of values of the one feature across all of the feature vectors generated for the at least one group of network traffic records, each self-similarity matrix including a plurality of elements in rows and columns, wherein an (i, j)-th element of a self-similarity matrix corresponds to a distance between a feature value of an i-th network traffic record and a feature value of a j-th network traffic record; transforming each self-similarity matrix into a corresponding histogram to form a set of histograms, each histogram being a representation of the one feature that is invariant to a number of network traffic records in the at least one group of network traffic records; generating a cumulative feature vector based on the set of histograms, the cumulative feature vector being a cumulative representation of the predefined set of features of all network traffic records included in the at least one group of network traffic records; training a classifier based on the cumulative feature vector to produce a trained classifier; classifying, by the trained classifier, the at least one group as being malicious; and identifying a malware network communication between the computing device and the server utilizing the at least one classified group, wherein the cumulative feature vector enables detection of variations and modifications of the malware network communication. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus comprising:
-
one or more processors; one or more memory devices in communication with the one or more processors; and at least one network interface unit coupled to the one or more processors, wherein the one or more processors are configured to; divide network traffic records to create at least one group of network traffic records, the at least one group including network traffic records being associated with network communications between a computing device and a server for a predetermined period of time; generate a set of feature vectors, each feature vector of the set of feature vectors representing one of the network traffic records of the network communications included in the at least one group of network traffic records, wherein each feature vector comprises a predefined set of features extracted from one of the network traffic records; compute a self-similarity matrix for each feature of the predefined set of features using all feature vectors generated for the at least one group, each self-similarity matrix being a representation of one feature of the predefined set of features that is invariant to an increase or a decrease of values of the one feature across all of the feature vectors generated for the at least one group of network traffic records, each self-similarity matrix including a plurality of elements in rows and columns, wherein an (i, j)-th element of a self-similarity matrix corresponds to a distance between a feature value of an i-th network traffic record and a feature value of a j-th network traffic record; transform each self-similarity matrix into a corresponding histogram to form a set of histograms, each histogram being a representation of the one feature that is invariant to a number of network traffic records in the at least one group of network traffic records; generate a cumulative feature vector based on the set of histograms, the cumulative feature vector being a cumulative representation of the predefined set of features of all network traffic records included in the at least one group of network traffic records; train a classifier based on the cumulative feature vector to produce a trained classifier; classify, by the trained classifier, the at least one group as being malicious; and identify a malware network communication between the computing device and the server utilizing the at least one classified group, wherein the cumulative feature vector enables detection of variations and modifications of the malware network communication. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. One or more computer readable non-transitory storage media encoded with software comprising computer executable instructions that when executed by one or more processors cause the one or more processor to:
-
divide network traffic records to create at least one group of network traffic records, the at least one group including network traffic records being associated with network communications between a computing device and a server for a predetermined period of time; generate a set of feature vectors, each feature vector of the set of feature vectors representing one of the network traffic records of the network communications included in the at least one group of network traffic records, wherein each feature vector comprises a predefined set of features extracted from one of the network traffic records; compute a self-similarity matrix for each feature of the predefined set of features using all feature vectors generated for the at least one group, each self-similarity matrix being a representation of one feature of the predefined set of features that is invariant to an increase or a decrease of values of the one feature across all of the feature vectors generated for the at least one group of network traffic records, each self-similarity matrix including a plurality of elements in rows and columns, wherein an (i, j)-th element of a self-similarity matrix corresponds to a distance between a feature value of an i-th network traffic record and a feature value of a j-th network traffic record; transform each self-similarity matrix into a corresponding histogram to form a set of histograms, each histogram being a representation of the one feature that is invariant to a number of network traffic records in the at least one group of network traffic records; generate a cumulative feature vector based on the set of histograms, the cumulative feature vector being a cumulative representation of the predefined set of features of all network traffic records included in the at least one group of network traffic records; train a classifier based on the cumulative feature vector to produce a trained classifier; classify, by the trained classifier, the at least one group as being malicious; and identify a malware network communication between the computing device and the server utilizing the at least one classified group, wherein the cumulative feature vector enables detection of variations and modifications of the malware network communication. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification