SYSTEM AND METHOD FOR AUTOMATED MACHINE-LEARNING, ZERO-DAY MALWARE DETECTION
First Claim
1. A method for improved zero-day malware detection comprising:
- receiving a set of training files which are each known to be either malign or benign;
partitioning the set of training files into a plurality of categories; and
training category-specific classifiers that distinguish between malign and benign files in a category of files, wherein the training comprises;
selecting one of the plurality of categories of training files;
identifying features present in the training files in the selected category of training files;
evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files; and
building a category-specific classifier based on the evaluated features.
7 Assignments
0 Petitions
Accused Products
Abstract
Improved systems and methods for automated machine-learning, zero-day malware detection. Embodiments include a method for improved zero-day malware detection that receives a set of training files which are each known to be either malign or benign, partitions the set of training files into a plurality of categories, and trains category-specific classifiers that distinguish between malign and benign files in a category of files. The training may include selecting one of the plurality of categories of training files, identifying features present in the training files in the selected category of training files, evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files, and building a category-specific classifier based on the evaluated features. Embodiments also include by a system and computer-readable medium with instructions for executing the above method.
-
Citations
31 Claims
-
1. A method for improved zero-day malware detection comprising:
-
receiving a set of training files which are each known to be either malign or benign; partitioning the set of training files into a plurality of categories; and training category-specific classifiers that distinguish between malign and benign files in a category of files, wherein the training comprises; selecting one of the plurality of categories of training files; identifying features present in the training files in the selected category of training files; evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files; and building a category-specific classifier based on the evaluated features. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for improved zero-day malware detection comprising:
-
receiving a set of training files which are each known to be either malign or benign; analyzing a training file from the set of training files to determine features of the training file; tagging the determined features of the training file with qualified meta-features (QMF) tags, wherein the tagging includes; extracting one of the determined features from the training file; identifying a location of the extracted feature in the training file; determining an appropriate QMF tag of the extracted feature based on the identified location; applying the determined QMF tag to the extracted feature; and repeating the extracting, identifying, determining and applying for the remaining determined features of the training file; repeating the analyzing and tagging for remaining training files in the set of training files; and building a model identifying features indicative of a malign file using the QMF-tagged features, wherein the model is capable of being used to detect malign files. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
26. A method for improved zero-day malware detection comprising:
-
receiving a set of training files which are each known to be either malign or benign; analyzing the set of training files to determine features of the training files; receiving a feature set description that includes a semantic label for each attribute class present in the training files and a set of corresponding attributes that make up the attribute class; generating a plurality of attribute class-specific feature vectors (FVs) for the training files using the determined features and the feature set description, wherein the FVs are vectors of features present in malign files of the attribute class; concatenating the plurality of attribute class-specific FVs into an extended feature vector (EFV) for the training files; and generating a target file classifier based on the EFV using a plurality of classifier algorithms. - View Dependent Claims (27, 28, 29, 30, 31)
-
Specification