System and method for automated machine-learning, zero-day malware detection
First Claim
1. A computer-implemented method for improved zero-day malware detection comprising:
- receiving, at a computer that includes one or more processors and memory, a set of training files which are each known to be either malign or benign, wherein the training files comprise one or more types of computer files;
analyzing, using the one or more computer processors, a training file from the set of training files to determine features of the training file, wherein the analyzing determines n-gram features;
tagging, using the one or more computer processors, the determined features of the training file with qualified meta-features (QMF) tags, wherein the tagging includes;
extracting one of the determined n-gram features from the training file;
identifying a location of the extracted n-gram feature in the training file;
determining an appropriate QMF tag of the extracted n-gram feature based on the identified location;
applying the determined QMF tag to the extracted n-gram feature; and
repeating the extracting, identifying, determining and applying for the remaining determined n-gram features of the training file;
repeating the analyzing and tagging for remaining training files in the set of training files; and
building, using the one or more computer processors, a model identifying n-gram features indicative of a malign file using the QMF-tagged n-gram features, wherein the model is capable of being used to detect malign files.
7 Assignments
0 Petitions
Accused Products
Abstract
Improved systems and methods for automated machine-learning, zero-day malware detection. Embodiments include a method for improved zero-day malware detection that receives a set of training files which are each known to be either malign or benign, partitions the set of training files into a plurality of categories, and trains category-specific classifiers that distinguish between malign and benign files in a category of files. The training may include selecting one of the plurality of categories of training files, identifying features present in the training files in the selected category of training files, evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files, and building a category-specific classifier based on the evaluated features. Embodiments also include by a system and computer-readable medium with instructions for executing the above method.
130 Citations
16 Claims
-
1. A computer-implemented method for improved zero-day malware detection comprising:
-
receiving, at a computer that includes one or more processors and memory, a set of training files which are each known to be either malign or benign, wherein the training files comprise one or more types of computer files; analyzing, using the one or more computer processors, a training file from the set of training files to determine features of the training file, wherein the analyzing determines n-gram features; tagging, using the one or more computer processors, the determined features of the training file with qualified meta-features (QMF) tags, wherein the tagging includes; extracting one of the determined n-gram features from the training file; identifying a location of the extracted n-gram feature in the training file; determining an appropriate QMF tag of the extracted n-gram feature based on the identified location; applying the determined QMF tag to the extracted n-gram feature; and repeating the extracting, identifying, determining and applying for the remaining determined n-gram features of the training file; repeating the analyzing and tagging for remaining training files in the set of training files; and building, using the one or more computer processors, a model identifying n-gram features indicative of a malign file using the QMF-tagged n-gram features, wherein the model is capable of being used to detect malign files. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method for improved zero-day malware detection comprising:
-
receiving, at a computer that includes one or more processors and memory, a set of training files which are each known to be either malign or benign, wherein the training files comprise one or more types of computer files; analyzing, using the one or more computer processors, the set of training files to determine features of the training files, wherein the analyzing determines n-gram features; receiving, using the one or more computer processors, a feature set description that includes a semantic label for each attribute class present in the training files and a set of corresponding attributes that make up the attribute class; generating, using the one or more computer processors, a plurality of attribute class-specific feature vectors (FVs) for the training files using the determined n-gram features and the feature set description, wherein the FVs are vectors of n-gram features present in malign files of the attribute class; concatenating, using the one or more computer processors, the plurality of attribute class-specific FVs into an extended feature vector (EFV) for the training files; and generating, using the one or more computer processors, a target file classifier based on the EFV using a plurality of classifier algorithms. - View Dependent Claims (12, 13, 14, 15, 16)
-
Specification