Detection of code-based malware
First Claim
Patent Images
1. A system comprising:
- one or more hardware processors; and
one or more computer-readable storage media storing computer-executable instructions that are executable by the one or more hardware processors to cause the system to perform operations including;
determining code contexts from known malicious script and known benign script;
building abstract syntax trees (ASTs) using code found in the code contexts;
extracting structural features from the known malicious script and known benign script based on structures and contents of the ASTs, the structural features being different from text of the known malicious script and the known benign script;
comparing structural features from unclassified script with the structural features from the known malicious script and the known benign script; and
classifying the unclassified script as malicious or benign based on the comparison of the structural features from the unclassified script with the structural features from the known malicious script and the known benign script.
2 Assignments
0 Petitions
Accused Products
Abstract
This document describes techniques for detection of code-based malware. According to some embodiments, the techniques utilize a collection of known malicious code and know benign code and determine which features of each type of code can be used to determine whether unclassified code is malicious or benign. The features can then be used to train a classifier (e.g., a Bayesian classifier) to characterize unclassified code as malicious or benign. In at least some embodiments, the techniques can be used as part of and/or in cooperation with a web browser to inspect web content (e.g., a web page) to determine if the content includes code-based malware.
-
Citations
18 Claims
-
1. A system comprising:
one or more hardware processors; and
one or more computer-readable storage media storing computer-executable instructions that are executable by the one or more hardware processors to cause the system to perform operations including;determining code contexts from known malicious script and known benign script; building abstract syntax trees (ASTs) using code found in the code contexts; extracting structural features from the known malicious script and known benign script based on structures and contents of the ASTs, the structural features being different from text of the known malicious script and the known benign script; comparing structural features from unclassified script with the structural features from the known malicious script and the known benign script; and classifying the unclassified script as malicious or benign based on the comparison of the structural features from the unclassified script with the structural features from the known malicious script and the known benign script. - View Dependent Claims (2, 3, 4, 5, 6)
-
7. A computer-implemented method comprising:
-
extracting a first set of structural features from known code by; unfolding the known code to determine code contexts associated with the known code; building one or more abstract syntax trees (ASTs) using the code contexts; and determining the first set of features based on the structure of the one or more ASTs; extracting a second set of structural features from the first set of structural features based on a determination of which features of the first set of structural features are predictive of a particular code classification, the second set of structural features being a subset of the first set of structural features and excluding one or more features of the first set of structural features that are determined not to be predictive of a particular code classification; training a classifier using the second set of structural features; and classifying with the classifier unclassified code based at least in part on the second set of structural features. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer-implemented method comprising:
-
building an abstract syntax tree (AST) using code contexts retrieved from one of a known malicious script or a known benign script; determining features of the known malicious script or the known benign script based on the structure and textual contents of the AST; matching features of an unclassified script to the features of the known malicious script or the known benign script; and classifying the unclassified script as malicious or benign based on the matching. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification