Using sequencing and timing information of behavior events in machine learning to detect malware
First Claim
1. A computer-implemented method for constructing a classifier for classifying computer files that takes into account behavior sequencing and timing information of the computer files, comprising:
- monitoring runtime behavior of a training file of a known classification;
detecting a plurality of behavior events exhibited by the training file, the plurality of behavior events detected at ones of a plurality of points in time;
responsive to detecting the plurality of behavior events at ones of the plurality of points in time, identifying (1) an event sequence exhibited by the training file reflecting the runtime behavior at the ones of the plurality of points in time and (2) timing information indicating, for each of the plurality of behavior events, a time gap between a process launch of the training file and a point in time when the associated behavior event is detected;
generating, for each of the plurality of behavior events, a feature vector encoded with information related to the training file at the point in time the associated behavior event is detected, the related information comprising values of a predetermined set of file attributes, an exhibited event sequence, and timing information;
constructing a classifier based on the feature vectors and the known classification of the training file; and
storing the classifier.
5 Assignments
0 Petitions
Accused Products
Abstract
A decision tree for classifying computer files is constructed. A set of training files known to be legitimate or malicious are executed and their runtime behaviors are monitored. When a behavior event is detected for one of the training file at a point in time, a feature vector is generated for that training file. Behavior sequencing and timing information for the training file at that point in time is identified and encoded in the feature vector. Feature vectors for each of the training files at various points in time are fed into a decision tree induction algorithm to construct a decision tree that takes into account of the sequencing and timing information.
-
Citations
19 Claims
-
1. A computer-implemented method for constructing a classifier for classifying computer files that takes into account behavior sequencing and timing information of the computer files, comprising:
-
monitoring runtime behavior of a training file of a known classification; detecting a plurality of behavior events exhibited by the training file, the plurality of behavior events detected at ones of a plurality of points in time; responsive to detecting the plurality of behavior events at ones of the plurality of points in time, identifying (1) an event sequence exhibited by the training file reflecting the runtime behavior at the ones of the plurality of points in time and (2) timing information indicating, for each of the plurality of behavior events, a time gap between a process launch of the training file and a point in time when the associated behavior event is detected; generating, for each of the plurality of behavior events, a feature vector encoded with information related to the training file at the point in time the associated behavior event is detected, the related information comprising values of a predetermined set of file attributes, an exhibited event sequence, and timing information; constructing a classifier based on the feature vectors and the known classification of the training file; and storing the classifier. - View Dependent Claims (2, 3, 4, 5, 6, 18, 19)
-
-
7. A computer system for constructing a classifier for classifying computer files that takes into account behavior sequencing and timing information of the computer files, comprising:
-
a non-transitory computer-readable storage medium storing executable computer program code comprising; a feature determination module for monitoring runtime behavior of a training file of a known classification, detecting a plurality of behavior events exhibited by the training file, the plurality of behavior events detected at ones of a plurality of points in time, identifying (1) an event sequence exhibited by the training file reflecting the runtime behavior at the ones of the plurality of points in time and (2) timing information indicating, for each of the plurality of behavior events, a time gap between a process launch of the training file and a point in time when the associated behavior event is detected in response to detecting the plurality of behavior events at ones of the plurality of points in time, and generating, for each of the plurality of behavior events, a feature vector encoded with information related to the training file at the point in time the associated behavior event is detected, the related information comprising values of a predetermined set of file attributes, an exhibited event sequence, and timing information, a machine learning engine for constructing a classifier based on the feature vectors and the known classification of the training file, and a data store for storing the classifier; and a processor for executing the computer program code. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage medium encoded with executable computer program code for constructing a classifier for classifying computer files that takes into account behavior sequencing and timing information of the computer files, the computer program code comprising program code for:
-
monitoring runtime behavior of a training file of a known classification; detecting a plurality of behavior events exhibited by the training file, the plurality of behavior events detected at ones of a plurality of points in time; responsive to detecting the plurality of behavior events at ones of the plurality of points in time, identifying (1) an event sequence exhibited by the training file reflecting the runtime behavior at the one of the plurality of points in time and (2) timing information indicating, for each of the plurality of behavior events, a time gap between a process launch of the training file and a point in time when the associated behavior event is detected; generating, for each of the plurality of behavior events, a feature vector encoded with information related to the training file at the point in time the associated behavior event is detected, the related information comprising values of a predetermined set of file attributes, an exhibited event sequence, and timing information; constructing a classifier based on the feature vectors and the known classification of the training file; and storing the classifier. - View Dependent Claims (14, 15, 16, 17)
-
Specification