Anomaly based malware detection
First Claim
1. A system, comprising:
- at least one processor; and
at least one memory including program code which when executed by the at least one processor provides operations comprising;
reducing a dimensionality of a plurality of features representative of files within a file set having a plurality of clusters, the files representing benign files such that the files in the reduced dimension representation of the file set conform to a mixture of Gaussian distributions, the reducing being performed by generating a random projection of the plurality of features, wherein the files in the file set are not distributed in a Gaussian manner prior to the reducing;
reducing a dimensionality of a plurality of features of a file to be classified;
determining, based at least on a reduced dimension representation of the file set and the reduced dimension representation of the file, a Mahalanobis distance between the file and the file set, the distance characterizing a deviation between the file and the file set;
determining, based at least on the distance between the file and the file set being greater than a threshold value indicating that the file is anomalous, a classification for the file, the classification being used to determine whether to access and/or execute the file; and
preventing the file from being accessed or executed when the classification indicates that the file is malware.
1 Assignment
0 Petitions
Accused Products
Abstract
In one respect, there is provided a system for training a neural network adapted for classifying one or more scripts. The system may include at least one processor and at least one memory. The memory may include program code that provides operations when executed by the at least one processor. The operations may include: reducing a dimensionality of a plurality of features representative of a file set; determining, based at least on a reduced dimensional representation of the file set, a distance between a file and the file set; and determining, based at least on the distance between the file and the file set, a classification for the file. Related methods and articles of manufacture, including computer program products, are also provided.
46 Citations
17 Claims
-
1. A system, comprising:
-
at least one processor; and at least one memory including program code which when executed by the at least one processor provides operations comprising; reducing a dimensionality of a plurality of features representative of files within a file set having a plurality of clusters, the files representing benign files such that the files in the reduced dimension representation of the file set conform to a mixture of Gaussian distributions, the reducing being performed by generating a random projection of the plurality of features, wherein the files in the file set are not distributed in a Gaussian manner prior to the reducing; reducing a dimensionality of a plurality of features of a file to be classified; determining, based at least on a reduced dimension representation of the file set and the reduced dimension representation of the file, a Mahalanobis distance between the file and the file set, the distance characterizing a deviation between the file and the file set; determining, based at least on the distance between the file and the file set being greater than a threshold value indicating that the file is anomalous, a classification for the file, the classification being used to determine whether to access and/or execute the file; and preventing the file from being accessed or executed when the classification indicates that the file is malware. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method, comprising:
-
reducing a dimensionality of a plurality of features representative of files within a file set having a plurality of clusters, the files representing benign files such that the files in the reduced dimension representation of the file set conform to a mixture of Gaussian distributions, the reducing being performed by generating a random projection of the plurality of features, wherein the files in the file set are not distributed in a Gaussian manner prior to the reducing; reducing a dimensionality of a plurality of features of a file to be classified; determining, based at least on a reduced dimension representation of the file set and the reduced dimension representation of the file, a Mahalanobis distance between a file and the file set; determining, based at least on the distance between the file and the file set being greater than a threshold value indicating that the file is anomalous, a classification for the file, the classification being used to determine whether to access and/or execute the file; and preventing the file from being accessed or executed when the classification indicates that the file is malware. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory computer-readable storage medium including program code which when executed by at least one processor causes operations comprising:
-
reducing a dimensionality of a plurality of features representative of files within a file set having a plurality of clusters, the files representing benign files such that the files in the reduced dimension representation of the file set conform to a mixture of Gaussian distributions, the reducing being performed by generating a random projection of the plurality of features, wherein the files in the file set are not distributed in a Gaussian manner prior to the reducing; reducing a dimensionality of a plurality of features of a file to be classified; determining, based at least on a reduced dimension representation of the file set and the reduced dimension representation of the file, a Mahalanobis distance between a file and the file set; determining, based at least on the distance between the file and the file set, a classification for the file, the classification being used to determine whether to access and/or execute the file; and preventing the file from being accessed or executed when the classification indicates that the file is malware; wherein the file is determined to be a malware file, when the distance between the file and the file set exceeds a threshold value, and wherein the file is determined to be a benign file, when the distance between the file and the file set does not exceed the threshold value.
-
Specification