System and method for automated machine-learning, zero-day malware detection

US 9,665,713 B2
Filed: 03/21/2016
Issued: 05/30/2017
Est. Priority Date: 09/26/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for improved zero-day malware detection comprising:

receiving, at a computer that includes one or more processors and memory, a set of training files which are each known to be either malign or benign, wherein the training files comprise one or more types of computer files;

partitioning, using the one or more computer processors, the set of training files into a plurality of categories wherein the categories are based on a type of file in each category; and

training, using the one or more computer processors, category-specific classifiers that distinguish between malign and benign files in a category of files, wherein the training comprises;

selecting one of the plurality of categories of training files, wherein each of the one or more categories corresponds to a type of file;

identifying features present in the training files in the selected category of training files, wherein the identifying identifies n-gram features and the n-gram features include n-bytes of code;

evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files; and

building a category-specific classifier based on the evaluated features.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Improved systems and methods for automated machine-learning, zero-day malware detection. Embodiments include a method for improved zero-day malware detection that receives a set of training files which are each known to be either malign or benign, partitions the set of training files into a plurality of categories, and trains category-specific classifiers that distinguish between malign and benign files in a category of files. The training may include selecting one of the plurality of categories of training files, identifying features present in the training files in the selected category of training files, evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files, and building a category-specific classifier based on the evaluated features. Embodiments also include by a system and computer-readable medium with instructions for executing the above method.

Citations

13 Claims

1. A computer-implemented method for improved zero-day malware detection comprising:
- receiving, at a computer that includes one or more processors and memory, a set of training files which are each known to be either malign or benign, wherein the training files comprise one or more types of computer files;
  
  partitioning, using the one or more computer processors, the set of training files into a plurality of categories wherein the categories are based on a type of file in each category; and
  
  training, using the one or more computer processors, category-specific classifiers that distinguish between malign and benign files in a category of files, wherein the training comprises;
  
  selecting one of the plurality of categories of training files, wherein each of the one or more categories corresponds to a type of file;
  
  identifying features present in the training files in the selected category of training files, wherein the identifying identifies n-gram features and the n-gram features include n-bytes of code;
  
  evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files; and
  
  building a category-specific classifier based on the evaluated features.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1 wherein the training category-specific classifiers further comprises repeating the selecting, identifying, evaluating and building for each of the plurality of categories of training files.
  - 3. The method of claim 2 further comprising building, using the one or more computer processors, a composite classifier by combining the category-specific classifier of each category of training files.
  - 4. The method of claim 1 wherein the identifying identifies n-grams that are found in the training files.
  - 5. The method of claim 1 wherein the identifying identifies n-grams in system calls or execution traces generated by execution of the training files.
  - 6. The method of claim 5 wherein the identifying further comprises extracting the identified n-grams.
  - 7. The method of claim 1 wherein the categories include one or more categories chosen from an executable file category, a MS Word™
    - file category, a MS Excel™
      
      file category and a PDF file category.
  - 8. The method of claim 1 wherein the partitioning of the training files includes determining the file type of each training file, from the one or more categories of training files, and dividing the training files into groups of same-type files.
  - 9. The method of claim 1 wherein the partitioning further includes creating a category for each new type of file encountered in the set of training files.
  - 10. The method of claim 1 further comprising:
    - receiving, using the one or more computer processors, one or more target, unknown files for classification;
      
      initializing, using the one or more computer processors, the composite classifier; and
      
      classifying, using the one or more computer processors, the one or more target, unknown files as malign or benign using the composite classifier.
  - 11. The method of claim 10 wherein the initializing comprises:
    - constructing, using the one or more computer processors, a map that connects file categories with category-specific classifiers;
      
      categorizing, using the one or more computer processors, each of the one or more target, unknown files; and
      
      determining, using the one or more computer processors, using the map which category-specific classifier to apply to each of the one or more unknown, target files in the classifying.

12. A non-transitory computer readable medium including instructions thereon for performing method for improved zero-day malware detection by:
- receiving, at a computer that includes one or more processors and memory, a set of training files which are each known to be either malign or benign, wherein the training files comprise one or more types of computer files;
  
  partitioning, using the one or more computer processors, the set of training files into a plurality of categories wherein the categories are based on a type of file in each category; and
  
  training, using the one or more computer processors, category-specific classifiers that distinguish between malign and benign files in a category of files, wherein the training comprises;
  
  selecting one of the plurality of categories of training files, wherein each of the one or more categories corresponds to a type of file;
  
  identifying features present in the training files in the selected category of training files, wherein the identifying identifies n-gram features and the n-gram features include n-bytes of code;
  
  evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files; and
  
  building a category-specific classifier based on the evaluated features.

13. A system for improved zero-day malware detection comprising:
- a computer including the one or more computer processors for executing instructions; and
  
  ,a memory, wherein the memory includes instructions for improved zero-day malware detection by;
  
  receiving, at a computer that includes one or more processors and memory, a set of training files which are each known to be either malign or benign, wherein the training files comprise one or more types of computer files;
  
  partitioning, using the one or more computer processors, the set of training files into a plurality of categories wherein the categories are based on a type of file in each category; and
  
  training, using the one or more computer processors, category-specific classifiers that distinguish between malign and benign files in a category of files, wherein the training comprises;
  
  selecting one of the plurality of categories of training files, wherein each of the one or more categories corresponds to a type of file;
  
  identifying features present in the training files in the selected category of training files, wherein the identifying identifies n-gram features and the n-gram features include n-bytes of code;
  
  evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files; and
  
  building a category-specific classifier based on the evaluated features.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
BluVector, Inc. (Comcast Corporation)
Original Assignee
BluVector, Inc. (Comcast Corporation)
Inventors
Avasarala, Bhargav R., Bose, Brock D., Day, John C., Steiner, Donald
Primary Examiner(s)
CERVETTI, DAVID GARCIA

Application Number

US15/076,073
Publication Number

US 20160203318A1
Time in Patent Office

435 Days
Field of Search

726 1, 726 22, 726 23, 726 24, 726 26, 726 32
US Class Current
CPC Class Codes

G06F 21/56   Computer malware detection ...

G06F 21/564   by virus signature recognition

G06F 21/566   Dynamic detection, i.e. det...

G06F 2221/034   Test or assess a computer o...

System and method for automated machine-learning, zero-day malware detection

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for automated machine-learning, zero-day malware detection

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links