SYSTEM AND METHOD FOR AUTOMATED MACHINE-LEARNING, ZERO-DAY MALWARE DETECTION

US 20140090061A1
Filed: 09/26/2013
Published: 03/27/2014
Est. Priority Date: 09/26/2012
Status: Active Grant

First Claim

Patent Images

1. A method for improved zero-day malware detection comprising:

receiving a set of training files which are each known to be either malign or benign;

partitioning the set of training files into a plurality of categories; and

training category-specific classifiers that distinguish between malign and benign files in a category of files, wherein the training comprises;

selecting one of the plurality of categories of training files;

identifying features present in the training files in the selected category of training files;

evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files; and

building a category-specific classifier based on the evaluated features.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Improved systems and methods for automated machine-learning, zero-day malware detection. Embodiments include a method for improved zero-day malware detection that receives a set of training files which are each known to be either malign or benign, partitions the set of training files into a plurality of categories, and trains category-specific classifiers that distinguish between malign and benign files in a category of files. The training may include selecting one of the plurality of categories of training files, identifying features present in the training files in the selected category of training files, evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files, and building a category-specific classifier based on the evaluated features. Embodiments also include by a system and computer-readable medium with instructions for executing the above method.

Citations

31 Claims

1. A method for improved zero-day malware detection comprising:
- receiving a set of training files which are each known to be either malign or benign;
  
  partitioning the set of training files into a plurality of categories; and
  
  training category-specific classifiers that distinguish between malign and benign files in a category of files, wherein the training comprises;
  
  selecting one of the plurality of categories of training files;
  
  identifying features present in the training files in the selected category of training files;
  
  evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files; and
  
  building a category-specific classifier based on the evaluated features.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1 wherein the training category-specific classifiers further comprises repeating the selecting, identifying, evaluating and building for each of the plurality of categories of training files.
  - 3. The method of claim 2 further comprising building a composite classifier by combining the category-specific classifier of each category of training files.
  - 4. The method of claim 1 wherein the identifying identifies n-grams that are found in the training files.
  - 5. The method of claim 1 wherein the identifying identifies n-grams in system calls or execution traces generated by execution of the training files.
  - 6. The method of claim 4 wherein the identifying further comprises extracting the identified n-grams.
  - 7. The method of claim 1 wherein the categories are based on a type of file in each category.
  - 8. The method of claim 4 wherein the categories include one more categories chosen from an executable file category, a MS Word™
    - file category, a MS Excel™
      
      file category and a PDF file category.
  - 9. The method of claim 1 wherein the partitioning of the training files comprises determining the file type of each training file and dividing the training files into groups of same-type files.
  - 10. The method of claim 8 wherein the partitioning further comprises creating a category for each new type of file encountered in the set of training files.
  - 11. The method of claim 3 further comprising:
    - receiving one or more target, unknown files for classification;
      
      initializing the composite classifier; and
      
      classifying the one or more target, unknown files as malign or benign using the composite classifier.
  - 12. The method of claim 11 wherein the initializing comprises:
    - constructing a map that connects file categories with category-specific classifiers;
      
      categorizing each of the one or more target, unknown files; and
      
      determining using the map which category-specific classifier to apply to each of the one or more unknown, target files in the classifying.
  - 13. A non-transitory computer readable medium including instructions thereon for performing the method for improved zero-day malware detection of claim 1.
  - 14. A system for improved zero-day malware detection comprising:
    - a processor for executing instructions; and
      
      a memory that includes instructions thereon that when executed perform the method of claim 1.

15. A method for improved zero-day malware detection comprising:
- receiving a set of training files which are each known to be either malign or benign;
  
  analyzing a training file from the set of training files to determine features of the training file;
  
  tagging the determined features of the training file with qualified meta-features (QMF) tags, wherein the tagging includes;
  
  extracting one of the determined features from the training file;
  
  identifying a location of the extracted feature in the training file;
  
  determining an appropriate QMF tag of the extracted feature based on the identified location;
  
  applying the determined QMF tag to the extracted feature; and
  
  repeating the extracting, identifying, determining and applying for the remaining determined features of the training file;
  
  repeating the analyzing and tagging for remaining training files in the set of training files; and
  
  building a model identifying features indicative of a malign file using the QMF-tagged features, wherein the model is capable of being used to detect malign files.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 16. The method of claim 15 wherein the analyzing further comprise determining offsets for the determined features, wherein the offsets indicate the location of the determined features in the training file.
  - 17. The method of claim 16 further comprising generating a mapping table that maps ranges of feature offsets to sections of the training file.
  - 18. The method of claim 17 wherein the sections of the file include header section, executable code section, and a data section.
  - 19. The method of claim 17 wherein the determining an appropriate QMF tag determines the appropriate QMF tag using the mapping table and the QMF tag indicates the file section of the extracted feature.
  - 20. The method of claim 15 further comprisingreceiving one or more target, unknown files for classification;
    - andclassifying the one or more target, unknown files as malign or benign using the model.
  - 21. The method of claim 20 wherein the classifying includes extracting features of the one or more target, unknown files and tagging the extracted features with QMF tags.
  - 22. The method of claim 21 wherein the classifying classifies the one or more target, unknown files as malign based on QMF-tagged features of the one or more target, unknown files matching QMF-tagged features from the training files.
  - 23. A non-transitory computer readable medium including instructions thereon for performing the method for improved zero-day malware detection of claim 15.
  - 24. A system for improved zero-day malware detection comprising:
    - a processor for executing instructions; and
      
      a memory that includes instructions thereon that when executed perform the method of claim 15.

26. A method for improved zero-day malware detection comprising:
- receiving a set of training files which are each known to be either malign or benign;
  
  analyzing the set of training files to determine features of the training files;
  
  receiving a feature set description that includes a semantic label for each attribute class present in the training files and a set of corresponding attributes that make up the attribute class;
  
  generating a plurality of attribute class-specific feature vectors (FVs) for the training files using the determined features and the feature set description, wherein the FVs are vectors of features present in malign files of the attribute class;
  
  concatenating the plurality of attribute class-specific FVs into an extended feature vector (EFV) for the training files; and
  
  generating a target file classifier based on the EFV using a plurality of classifier algorithms.
- View Dependent Claims (27, 28, 29, 30, 31)
- - 27. The method of claim 26 wherein the analyzing the set of training files includes extracting determined features from the training files.
  - 28. The method of claim 26 further comprising:
    - receiving a target, unknown file;
      
      analyzing the target, unknown file to determine features of the target, unknown file;
      
      generating a plurality of attribute class-specific FVs of the target, unknown file using the determined features of the target, unknown file;
      
      concatenating the plurality attribute class-specific FVs of the target, unknown file into an EFV for the target, unknown file; and
      
      classifying the target, unknown file as malign or benign by applying the target file classifier to the EFV of the target, unknown file.
  - 29. The method of claim 26 further comprising parsing the feature set description and defining a data structure that holds the attribute classes and sets of corresponding attributes as key-value pairs.
  - 30. A non-transitory computer readable medium including instructions thereon for performing the method for improved zero-day malware detection of claim 26.
  - 31. A system for improved zero-day malware detection comprising:
    - a processor for executing instructions; and
      
      a memory that includes instructions thereon that when executed perform the method of claim 26.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
BluVector, Inc. (Comcast Corporation)
Original Assignee
Northrop Grumman Systems Corporation (Northrop Grumman Corporation)
Inventors
AVASARALA, Bhargav R., BOSE, Brock D., DAY, John C., STEINER, Donald

Granted Patent

US 9,292,688 B2
Time in Patent Office

Days
Field of Search
US Class Current

726/24
CPC Class Codes

G06F 21/56   Computer malware detection ...

G06F 21/564   by virus signature recognition

G06F 21/566   Dynamic detection, i.e. det...

G06F 2221/034   Test or assess a computer o...

SYSTEM AND METHOD FOR AUTOMATED MACHINE-LEARNING, ZERO-DAY MALWARE DETECTION

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR AUTOMATED MACHINE-LEARNING, ZERO-DAY MALWARE DETECTION

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links