System and method for automated machine-learning, zero-day malware detection
First Claim
1. A computer-implemented method for improved zero-day malware detection comprising:
- receiving, at a computer that includes one or more processors and memory, a set of training files which are each known to be either malign or benign, wherein the training files comprise one or more types of computer files;
partitioning, using the one or more computer processors, the set of training files into a plurality of categories wherein the categories are based on a type of file in each category; and
training, using the one or more computer processors, category-specific classifiers that distinguish between malign and benign files in a category of files, wherein the training comprises;
selecting one of the plurality of categories of training files, wherein each of the one or more categories corresponds to a type of file;
identifying features present in the training files in the selected category of training files, wherein the identifying identifies n-gram features and the n-gram features include n-bytes of code;
evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files; and
building a category-specific classifier based on the evaluated features.
6 Assignments
0 Petitions
Accused Products
Abstract
Improved systems and methods for automated machine-learning, zero-day malware detection. Embodiments include a method for improved zero-day malware detection that receives a set of training files which are each known to be either malign or benign, partitions the set of training files into a plurality of categories, and trains category-specific classifiers that distinguish between malign and benign files in a category of files. The training may include selecting one of the plurality of categories of training files, identifying features present in the training files in the selected category of training files, evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files, and building a category-specific classifier based on the evaluated features. Embodiments also include by a system and computer-readable medium with instructions for executing the above method.
-
Citations
13 Claims
-
1. A computer-implemented method for improved zero-day malware detection comprising:
-
receiving, at a computer that includes one or more processors and memory, a set of training files which are each known to be either malign or benign, wherein the training files comprise one or more types of computer files; partitioning, using the one or more computer processors, the set of training files into a plurality of categories wherein the categories are based on a type of file in each category; and training, using the one or more computer processors, category-specific classifiers that distinguish between malign and benign files in a category of files, wherein the training comprises; selecting one of the plurality of categories of training files, wherein each of the one or more categories corresponds to a type of file; identifying features present in the training files in the selected category of training files, wherein the identifying identifies n-gram features and the n-gram features include n-bytes of code; evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files; and building a category-specific classifier based on the evaluated features. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory computer readable medium including instructions thereon for performing method for improved zero-day malware detection by:
-
receiving, at a computer that includes one or more processors and memory, a set of training files which are each known to be either malign or benign, wherein the training files comprise one or more types of computer files; partitioning, using the one or more computer processors, the set of training files into a plurality of categories wherein the categories are based on a type of file in each category; and training, using the one or more computer processors, category-specific classifiers that distinguish between malign and benign files in a category of files, wherein the training comprises; selecting one of the plurality of categories of training files, wherein each of the one or more categories corresponds to a type of file; identifying features present in the training files in the selected category of training files, wherein the identifying identifies n-gram features and the n-gram features include n-bytes of code; evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files; and building a category-specific classifier based on the evaluated features.
-
-
13. A system for improved zero-day malware detection comprising:
-
a computer including the one or more computer processors for executing instructions; and
,a memory, wherein the memory includes instructions for improved zero-day malware detection by; receiving, at a computer that includes one or more processors and memory, a set of training files which are each known to be either malign or benign, wherein the training files comprise one or more types of computer files; partitioning, using the one or more computer processors, the set of training files into a plurality of categories wherein the categories are based on a type of file in each category; and training, using the one or more computer processors, category-specific classifiers that distinguish between malign and benign files in a category of files, wherein the training comprises; selecting one of the plurality of categories of training files, wherein each of the one or more categories corresponds to a type of file; identifying features present in the training files in the selected category of training files, wherein the identifying identifies n-gram features and the n-gram features include n-bytes of code; evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files; and building a category-specific classifier based on the evaluated features.
-
Specification