System and method for automated machine-learning, zero-day malware detection

US 9,292,688 B2
Filed: 09/26/2013
Issued: 03/22/2016
Est. Priority Date: 09/26/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for improved zero-day malware detection comprising:

receiving, at a computer that includes one or more processors and memory, a set of training files which are each known to be either malign or benign, wherein the training files comprise one or more types of computer files;

analyzing, using the one or more computer processors, a training file from the set of training files to determine features of the training file, wherein the analyzing determines n-gram features;

tagging, using the one or more computer processors, the determined features of the training file with qualified meta-features (QMF) tags, wherein the tagging includes;

extracting one of the determined n-gram features from the training file;

identifying a location of the extracted n-gram feature in the training file;

determining an appropriate QMF tag of the extracted n-gram feature based on the identified location;

applying the determined QMF tag to the extracted n-gram feature; and

repeating the extracting, identifying, determining and applying for the remaining determined n-gram features of the training file;

repeating the analyzing and tagging for remaining training files in the set of training files; and

building, using the one or more computer processors, a model identifying n-gram features indicative of a malign file using the QMF-tagged n-gram features, wherein the model is capable of being used to detect malign files.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Improved systems and methods for automated machine-learning, zero-day malware detection. Embodiments include a method for improved zero-day malware detection that receives a set of training files which are each known to be either malign or benign, partitions the set of training files into a plurality of categories, and trains category-specific classifiers that distinguish between malign and benign files in a category of files. The training may include selecting one of the plurality of categories of training files, identifying features present in the training files in the selected category of training files, evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files, and building a category-specific classifier based on the evaluated features. Embodiments also include by a system and computer-readable medium with instructions for executing the above method.

130 Citations

View as Search Results

16 Claims

1. A computer-implemented method for improved zero-day malware detection comprising:
- receiving, at a computer that includes one or more processors and memory, a set of training files which are each known to be either malign or benign, wherein the training files comprise one or more types of computer files;
  
  analyzing, using the one or more computer processors, a training file from the set of training files to determine features of the training file, wherein the analyzing determines n-gram features;
  
  tagging, using the one or more computer processors, the determined features of the training file with qualified meta-features (QMF) tags, wherein the tagging includes;
  
  extracting one of the determined n-gram features from the training file;
  
  identifying a location of the extracted n-gram feature in the training file;
  
  determining an appropriate QMF tag of the extracted n-gram feature based on the identified location;
  
  applying the determined QMF tag to the extracted n-gram feature; and
  
  repeating the extracting, identifying, determining and applying for the remaining determined n-gram features of the training file;
  
  repeating the analyzing and tagging for remaining training files in the set of training files; and
  
  building, using the one or more computer processors, a model identifying n-gram features indicative of a malign file using the QMF-tagged n-gram features, wherein the model is capable of being used to detect malign files.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 wherein the analyzing further comprises determining offsets for the determined features, wherein the offsets indicate the location of the determined features in the training file.
  - 3. The method of claim 2 further comprising generating, using the one or more computer processors, a mapping table that maps ranges of feature offsets to sections of the training file.
  - 4. The method of claim 3 wherein the sections of the file include header section, executable code section, and a data section.
  - 5. The method of claim 3 wherein the determining an appropriate QMF tag determines the appropriate QMF tag using the mapping table and the QMF tag indicates the file section of the extracted feature.
  - 6. The method of claim 1 further comprisingreceiving, using the one or more computer processors, one or more target, unknown computer files for classification;
    - andclassifying, using the one or more computer processors, the one or more target, unknown computer files as malign or benign using the model.
  - 7. The method of claim 6 wherein the classifying includes extracting features of the one or more target, unknown files and tagging the extracted features with QMF tags.
  - 8. The method of claim 7 wherein the classifying classifies the one or more target, unknown files as malign based on QMF-tagged features of the one or more target, unknown files matching QMF-tagged features from the training files.
  - 9. A non-transitory computer readable medium including instructions thereon for performing the method for improved zero-day malware detection of claim 1.
  - 10. A system for improved zero-day malware detection comprising:
    - a processor for executing instructions; and
      
      a memory that includes instructions thereon that when executed perform the method of claim 1.

11. A computer-implemented method for improved zero-day malware detection comprising:
- receiving, at a computer that includes one or more processors and memory, a set of training files which are each known to be either malign or benign, wherein the training files comprise one or more types of computer files;
  
  analyzing, using the one or more computer processors, the set of training files to determine features of the training files, wherein the analyzing determines n-gram features;
  
  receiving, using the one or more computer processors, a feature set description that includes a semantic label for each attribute class present in the training files and a set of corresponding attributes that make up the attribute class;
  
  generating, using the one or more computer processors, a plurality of attribute class-specific feature vectors (FVs) for the training files using the determined n-gram features and the feature set description, wherein the FVs are vectors of n-gram features present in malign files of the attribute class;
  
  concatenating, using the one or more computer processors, the plurality of attribute class-specific FVs into an extended feature vector (EFV) for the training files; and
  
  generating, using the one or more computer processors, a target file classifier based on the EFV using a plurality of classifier algorithms.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. The method of claim 11 wherein the analyzing the set of training files includes extracting determined features from the training files.
  - 13. The method of claim 11 further comprising:
    - receiving, using the one or more computer processors, a target, unknown computer file;
      
      analyzing, using the one or more computer processors, the target, unknown computer file to determine features of the target, unknown file;
      
      generating, using the one or more computer processors, a plurality of attribute class-specific FVs of the target, unknown computer file using the determined features of the target, unknown file;
      
      concatenating, using the one or more computer processors, the plurality attribute class-specific FVs of the target, unknown computer file into an EFV for the target, unknown computer file; and
      
      classifying, using the one or more computer processors, the target, unknown computer file as malign or benign by applying the target file classifier to the EFV of the target, unknown computer file.
  - 14. The method of claim 11 further comprising parsing, using the one or more computer processors, the feature set description and defining a data structure that holds the attribute classes and sets of corresponding attributes as key-value pairs.
  - 15. A non-transitory computer readable medium including instructions thereon for performing the method for improved zero-day malware detection of claim 11.
  - 16. A system for improved zero-day malware detection comprising:
    - a processor for executing instructions; and
      
      a memory that includes instructions thereon that when executed perform the method of claim 11.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
BluVector, Inc. (Comcast Corporation)
Original Assignee
Northrop Grumman Systems Corporation (Northrop Grumman Corporation)
Inventors
Steiner, Donald, Avasarala, Bhargav R., Bose, Brock D., Day, John C.
Primary Examiner(s)
Cervetti, David Garcia

Application Number

US14/038,682
Publication Number

US 20140090061A1
Time in Patent Office

908 Days
Field of Search

726/24
US Class Current

1/1
CPC Class Codes

G06F 21/56   Computer malware detection ...

G06F 21/564   by virus signature recognition

G06F 21/566   Dynamic detection, i.e. det...

G06F 2221/034   Test or assess a computer o...

System and method for automated machine-learning, zero-day malware detection

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

130 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for automated machine-learning, zero-day malware detection

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

130 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links