Static anomaly-based detection of malware files
First Claim
1. A method for detecting anomalous files, the method comprising:
- obtaining a file on a client for classification;
obtaining metadata associated with the file;
determining, based on the metadata, a subclass of the file selected from a plurality of subclasses;
selecting a model of a plurality of models based on the subclass of the file, wherein the selected model characterizes a plurality of features of a sample of clean files that are each associated with the subclass, wherein each of the plurality of models is derived from a training set of clean files belonging to a particular subclass and wherein different ones of the plurality of models are associated with different subclasses;
generating, by a processor, an anomaly score of the file by applying the file to the selected model, the anomaly score indicating a level of dissimilarity between features of the file and the plurality of features of the sample of clean files of the selected model;
comparing the anomaly score against at least one of a lower threshold score, a center threshold score, and an upper threshold score;
classifying the file as anomalous based on the anomaly score; and
remediating the file by the client responsive to the classification of the file.
7 Assignments
0 Petitions
Accused Products
Abstract
A protection application detects and remediates malicious files on a client. The protection application trains models using known samples of static clean files, and the models characterize features of the clean files. A model may be selected based on metadata obtained from a target file. By processing features of the clean files and features of the target file, the model may generate an anomaly score indicating a level of dissimilarity between the target file and the sample. The protection application compares the anomaly score to one or more threshold scores to classify the target file. Additionally, the target file may be provided to a security server to check against a whitelist or blacklist for classification. Responsive to a classification as malicious, the protection application remediates the target file on the client.
-
Citations
17 Claims
-
1. A method for detecting anomalous files, the method comprising:
- obtaining a file on a client for classification;
obtaining metadata associated with the file;
determining, based on the metadata, a subclass of the file selected from a plurality of subclasses;
selecting a model of a plurality of models based on the subclass of the file, wherein the selected model characterizes a plurality of features of a sample of clean files that are each associated with the subclass, wherein each of the plurality of models is derived from a training set of clean files belonging to a particular subclass and wherein different ones of the plurality of models are associated with different subclasses;
generating, by a processor, an anomaly score of the file by applying the file to the selected model, the anomaly score indicating a level of dissimilarity between features of the file and the plurality of features of the sample of clean files of the selected model;
comparing the anomaly score against at least one of a lower threshold score, a center threshold score, and an upper threshold score;
classifying the file as anomalous based on the anomaly score; and
remediating the file by the client responsive to the classification of the file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 16, 17)
- obtaining a file on a client for classification;
-
8. A non-transitory computer-readable storage medium storing instructions for detecting anomalous files, the instructions when executed by a processor causing the processor to perform steps including:
- obtaining a file on a client for classification;
obtaining metadata associated with the file;
determining, based on the metadata, a subclass of the file selected from a plurality of subclasses;
selecting a model of a plurality of models based on the subclass of the file, wherein the selected model characterizes a plurality of features of a sample of clean files that are each associated with the subclass, wherein each of the plurality of models is derived from a training set of clean files belonging to a particular subclass and wherein different ones of the plurality of models are associated with different subclasses;
generating an anomaly score of the file by applying the file to the selected model, the anomaly score indicating a level of dissimilarity between features of the file and the plurality of features of the sample of clean files of the selected model;
comparing the anomaly score against at least one of a lower threshold score, a center threshold score, and an upper threshold score;
classifying the file as anomalous based on the anomaly score; and
remediating the file by the client responsive to the classification of the file. - View Dependent Claims (9, 10, 11, 12)
- obtaining a file on a client for classification;
-
13. A computing system comprising:
- a processor; and
a non-transitory computer-readable storage medium storing instructions for generating information for detecting anomalous files, the instructions when executed by the processor causing the processor to perform steps including;
obtaining a file on a client for classification;
obtaining metadata associated with the file;
determining, based on the metadata, a subclass of the file selected from a plurality of subclasses;
selecting a model of a plurality of models based on the subclass of the file, wherein the selected model characterizes a plurality of features of a sample of clean files that are each associated with the subclass, wherein each of the plurality of models is derived from a training set of clean files belonging to a particular subclass and wherein different ones of the plurality of models are associated with different subclasses;
generating an anomaly score of the file by applying the file to the selected model, the anomaly score indicating a level of dissimilarity between features of the file and the plurality of features of the sample of clean files of the selected model;
comparing the anomaly score against at least one of a lower threshold score, a center threshold score, and an upper threshold score;
classifying the file as anomalous based on the anomaly score; and
remediating the file by the client responsive to the classification of the file. - View Dependent Claims (14, 15)
- a processor; and
Specification