DETECTION OF CODE-BASED MALWARE

US 20120216280A1
Filed: 02/18/2011
Published: 08/23/2012
Est. Priority Date: 02/18/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

extracting structural features from known malicious script and known benign script;

comparing structural features from unclassified script with the structural features from the known malicious script and the known benign script; and

classifying the unclassified script as malicious or benign based on the comparison of the structural features from the unclassified script with the structural features from the known malicious script and the known benign script.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This document describes techniques for detection of code-based malware. According to some embodiments, the techniques utilize a collection of known malicious code and know benign code and determine which features of each type of code can be used to determine whether unclassified code is malicious or benign. The features can then be used to train a classifier (e.g., a Bayesian classifier) to characterize unclassified code as malicious or benign. In at least some embodiments, the techniques can be used as part of and/or in cooperation with a web browser to inspect web content (e.g., a web page) to determine if the content includes code-based malware.

Citations

20 Claims

1. A computer-implemented method comprising:
- extracting structural features from known malicious script and known benign script;
  
  comparing structural features from unclassified script with the structural features from the known malicious script and the known benign script; and
  
  classifying the unclassified script as malicious or benign based on the comparison of the structural features from the unclassified script with the structural features from the known malicious script and the known benign script.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method as recited in claim 1, wherein the structural features from one or more of the known malicious script or the known benign script include one or more of a loop, a function, a conditional, a string, a variable declaration, or a try/catch block.
  - 3. The method as recited in claim 1, wherein the extracting comprises:
    - determining code contexts from the known malicious script and known benign script;
      
      building abstract syntax trees (ASTs) using code found in the code contexts; and
      
      determining the structural features of the known malicious script and the known benign script based on structures and contents of the ASTs.
  - 4. The method as recited in claim 1, wherein classifying the unclassified script as malicious or benign comprises using a state machine to match the structural features from the unclassified script with the structural features from one of the known malicious script or the known benign script.
  - 5. The method as recited in claim 1, wherein classifying the unclassified script as malicious or benign comprises calculating a probability or another numeric score indicating that the unclassified script is malicious or benign.
  - 6. The method as recited in claim 5, wherein the probability or another numeric score that the unclassified script is malicious or benign is calculated using a Bayesian classifier.
  - 7. The method as recited in claim 1, wherein the structural features of the unclassified script are determined by:
    - de-obfuscating the unclassified script;
      
      building one or more abstract syntax trees (ASTs) using the de-obfuscated unclassified script; and
      
      determining the structural features of the unclassified script based one or more of the contents or the structure of the one or more ASTs.

8. A computer-implemented method comprising:
- extracting a first set of features from known code;
  
  extracting a second set of features based on a determination of which features of the first set of features are predictive of a particular code classification;
  
  training a classifier using the second set of features; and
  
  classifying with the classifier unclassified code based at least in part on the second set of features.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method as recited in claim 8, wherein the known code comprises known malicious code and known benign code, the particular code classification comprises classifying code as malicious or benign, and wherein classifying the unclassified code comprises classifying the unclassified code as malicious or benign.
  - 10. The method as recited in claim 8, wherein extracting the first set of features comprises:
    - unfolding the known code to determine code contexts associated with the known code;
      
      building one or more abstract syntax trees (ASTs) using the code contexts; and
      
      determining the first set of features based on the structure of the one or more ASTs.
  - 11. The method as recited in claim 8, wherein the determination of which features of the first set of features are predictive of the particular code classification is based on an analysis of the features of the first set of features using a x2 algorithm.
  - 12. The method as recited in claim 8, wherein the classifier is configured to be implemented in a web browsing environment.
  - 13. The method as recited in claim 8, wherein classifying the unclassified code comprises using a state machine to match one or more features from the unclassified code with one or more features from the second set of features.
  - 14. The method as recited in claim 8, further comprising updating the classifier with a third set of features from different known code.

15. A computer-implemented method comprising:
- building an abstract syntax tree (AST) using code contexts retrieved from one of a known malicious script or a known benign script;
  
  determining features of the known malicious script or the known benign script based on the structure and contents of the AST;
  
  matching features of an unclassified script to the features of the known malicious script or the known benign script; and
  
  classifying the unclassified script as malicious or benign based on the matching.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The method as recited in claim 15, wherein the code contexts comprise fragments of code from one of the known malicious script or the known benign script.
  - 17. The method as recited in claim 15, wherein building the AST comprises:
    - de-obfuscating the one of the known malicious script or the known benign script; and
      
      running the de-obfuscated known malicious script or known benign script to determine the code contexts.
  - 18. The method as recited in claim 15, wherein the features of the known malicious script or the known benign script comprise one or more of structural features or content features.
  - 19. The method as recited in claim 15, wherein the features of the known malicious script or the known benign script are used to train a classifier, and wherein classifying the unclassified script as malicious or benign is implemented by the classifier.
  - 20. The method as recited in claim 19, wherein the classifier is configured to classify the unclassified script as malicious or benign by calculating a probability or a numeric score that the unclassified script is malicious or benign.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Zorn, Benjamin Goth, Livshits, Benjamin, Curtsinger, Charles M., Seifert, Christian

Granted Patent

US 8,713,679 B2
Time in Patent Office

Days
Field of Search
US Class Current

726/23
CPC Class Codes

G06N 7/01 Probabilistic graphical mod...

DETECTION OF CODE-BASED MALWARE

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

DETECTION OF CODE-BASED MALWARE

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links