Malware detection using pattern classification
DCFirst Claim
1. A method of training a malware classifier, said method comprising:
- determining a classification label that represents a type of malware, said type of malware not including benign software;
determining a classification label that represents a second type of malware;
creating a feature definition file that includes first features relevant to the classification of said type of malware and that includes second features relevant to the classification of said second type of malware, wherein said first and second features are combined into one feature set in said feature definition file, wherein said features include characteristics of said type of malware, DLL names and function names executed by said type of malware, and alphanumeric strings used by said type of malware;
selecting software training data including software of the same type as said type of malware and software that is benign;
executing a training application on a computer associated with said malware classifier and inputting said feature definition file and said software training data into said training application; and
outputting a training model associated with said malware classifier on said computer, whereby said training model is arranged to assist in the identification of said type of malware and said second type of malware.
2 Assignments
Litigations
0 Petitions
Accused Products
Abstract
A malware classifier uses features of suspect software to classify the software as malicious or not. The classifier uses a pattern classification algorithm to statistically analyze computer software. The classifier takes a feature representation of the software and maps it to the classification label with the use of a trained model. The feature representation of the input computer software includes the relevant features and the values of each feature. These features include the categories of: applicable software characteristics of a particular type of malware; dynamic link library (DLL) and function name strings typically occurring in the body of the malware; and other alphanumeric strings commonly found in malware. By providing these features and their values to the classifier, the classifier is better able to identify a particular type of malware.
210 Citations
22 Claims
-
1. A method of training a malware classifier, said method comprising:
-
determining a classification label that represents a type of malware, said type of malware not including benign software; determining a classification label that represents a second type of malware; creating a feature definition file that includes first features relevant to the classification of said type of malware and that includes second features relevant to the classification of said second type of malware, wherein said first and second features are combined into one feature set in said feature definition file, wherein said features include characteristics of said type of malware, DLL names and function names executed by said type of malware, and alphanumeric strings used by said type of malware; selecting software training data including software of the same type as said type of malware and software that is benign; executing a training application on a computer associated with said malware classifier and inputting said feature definition file and said software training data into said training application; and outputting a training model associated with said malware classifier on said computer, whereby said training model is arranged to assist in the identification of said type of malware and said second type of malware. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 15)
-
-
9. A method of classifying a suspect software program, said method comprising:
-
selecting a group of features relevant to the identification of a particular type of malware, wherein said particular type of malware does not include benign software and wherein said group of features include characteristics of said type of malware, DLL names and function names executed by said type of malware, and alphanumeric strings used by said type of malware; selecting a second group of features relevant to the identification of a second particular type of malware; combining said first and second groups of features into one selected feature set; selecting a trained model, said trained model being trained to identify said particular type of malware and said second particular type of malware; extracting a subset of said first and second features and their corresponding values from said suspect software program utilizing said selected feature set; executing a classification algorithm on a computer and inputting said subset of features, said corresponding values, and said trained model, wherein said classification algorithm combines logic of classification functions for detecting said type of malware and said second type of malware; and outputting a classification label using said computer for said suspect software program that identifies said type of malware or said second type of malware. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
16. A malware classifier apparatus implemented on a computer for classifying suspect software, said malware classifier comprising:
-
a feature definition file including first features relevant to the identification of a type of malware and second features relevant to the identification of a second particular type of malware, wherein said first and second features are combined into one feature set in said feature definition file, said type of malware not including benign software and, wherein said features include characteristics of said type of malware, DLL names and function names executed by said type of malware, and alphanumeric strings used by said type of malware; a trained model, said model being trained to identify said type of malware and said second particular type of malware; a feature extraction module arranged to accept as input computer software and said feature definition file and to extract a subset of said first and second features and their values from said computer software using a computer; a pattern classification algorithm that accepts said subset of features and their values and uses said trained model to output a classification label using said computer for said input computer software that identifies said type of malware or said second type of malware, wherein said pattern classification algorithm combines logic of classification functions for detecting said type of malware and said second type of malware. - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
Specification