Static anomaly-based detection of malware files

US 10,089,467 B1
Filed: 05/23/2017
Issued: 10/02/2018
Est. Priority Date: 05/23/2017
Status: Active Grant

First Claim

Patent Images

1. A method for detecting anomalous files, the method comprising:

obtaining a file on a client for classification;

obtaining metadata associated with the file;

determining, based on the metadata, a subclass of the file selected from a plurality of subclasses;

selecting a model of a plurality of models based on the subclass of the file, wherein the selected model characterizes a plurality of features of a sample of clean files that are each associated with the subclass, wherein each of the plurality of models is derived from a training set of clean files belonging to a particular subclass and wherein different ones of the plurality of models are associated with different subclasses;

generating, by a processor, an anomaly score of the file by applying the file to the selected model, the anomaly score indicating a level of dissimilarity between features of the file and the plurality of features of the sample of clean files of the selected model;

comparing the anomaly score against at least one of a lower threshold score, a center threshold score, and an upper threshold score;

classifying the file as anomalous based on the anomaly score; and

remediating the file by the client responsive to the classification of the file.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A protection application detects and remediates malicious files on a client. The protection application trains models using known samples of static clean files, and the models characterize features of the clean files. A model may be selected based on metadata obtained from a target file. By processing features of the clean files and features of the target file, the model may generate an anomaly score indicating a level of dissimilarity between the target file and the sample. The protection application compares the anomaly score to one or more threshold scores to classify the target file. Additionally, the target file may be provided to a security server to check against a whitelist or blacklist for classification. Responsive to a classification as malicious, the protection application remediates the target file on the client.

Citations

17 Claims

1. A method for detecting anomalous files, the method comprising:
- obtaining a file on a client for classification;
  
  obtaining metadata associated with the file;
  
  determining, based on the metadata, a subclass of the file selected from a plurality of subclasses;
  
  selecting a model of a plurality of models based on the subclass of the file, wherein the selected model characterizes a plurality of features of a sample of clean files that are each associated with the subclass, wherein each of the plurality of models is derived from a training set of clean files belonging to a particular subclass and wherein different ones of the plurality of models are associated with different subclasses;
  
  generating, by a processor, an anomaly score of the file by applying the file to the selected model, the anomaly score indicating a level of dissimilarity between features of the file and the plurality of features of the sample of clean files of the selected model;
  
  comparing the anomaly score against at least one of a lower threshold score, a center threshold score, and an upper threshold score;
  
  classifying the file as anomalous based on the anomaly score; and
  
  remediating the file by the client responsive to the classification of the file.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 16, 17)
- - 2. The method of claim 1, wherein obtaining the file on the client comprises:
    - applying a filter to a plurality of files to filter out files having characteristics indicative of clean files; and
      
      selecting the file from the plurality of files on the client responsive to the file passing through the filter.
  - 3. The method of claim 2, further comprising:
    - selecting the filter based on the subclass of the file.
  - 4. The method of claim 1, wherein generating the anomaly score further comprises:
    - deriving the features of the file;
      
      determining distances between the features of the file and the plurality of features of the sample of clean files; and
      
      generating the anomaly score as a combination of the distances.
  - 5. The method of claim 1, wherein classifying the file as anomalous comprises:
    - determining that the anomaly score is greater than a threshold score;
      
      responsive to the determination, providing the file to a security server for comparison against a whitelist of known clean files; and
      
      classifying the file as anomalous responsive to receiving an indication from the security server that the file is not on the whitelist.
  - 6. The method of claim 1, wherein classifying the file as anomalous comprises:
    - determining that the anomaly score is less than a threshold score;
      
      responsive to the determination, providing the file to a security server for comparison against a blacklist of known malware; and
      
      classifying the file as anomalous responsive to receiving an indication from the security server that the file is on the blacklist.
  - 7. The method of claim 1, wherein classifying the file as anomalous comprises:
    - responsive to determining that the anomaly score is greater than the center threshold score and less than the upper threshold score, providing the file to a security server for comparison against a whitelist of known clean files;
      
      responsive to determining that the anomaly score is less than the center threshold score and greater than the lower threshold score, providing the file to the security server for comparison against a blacklist of known malware files; and
      
      classifying the file as anomalous responsive to determining that (i) the anomaly score is greater than the upper threshold, (ii) the anomaly score is between the center threshold score and the upper threshold score and is not on the whitelist, or (iii) the anomaly score is between the center threshold score and the lower threshold score and is on the blacklist.
  - 16. The method of claim 3, wherein each of the plurality of different subclasses indicates a file source, and wherein each of the plurality of models is derived from a training set of clean files from the file source indicated by the associated subclass.
  - 17. The method of claim 1, wherein a first model of the plurality of models is derived from a first training set of clean files download from an online service file source, and wherein a second model of the plurality of models is derived from a second training set of clean files obtained from a local disk file source.

8. A non-transitory computer-readable storage medium storing instructions for detecting anomalous files, the instructions when executed by a processor causing the processor to perform steps including:
- obtaining a file on a client for classification;
  
  obtaining metadata associated with the file;
  
  determining, based on the metadata, a subclass of the file selected from a plurality of subclasses;
  
  selecting a model of a plurality of models based on the subclass of the file, wherein the selected model characterizes a plurality of features of a sample of clean files that are each associated with the subclass, wherein each of the plurality of models is derived from a training set of clean files belonging to a particular subclass and wherein different ones of the plurality of models are associated with different subclasses;
  
  generating an anomaly score of the file by applying the file to the selected model, the anomaly score indicating a level of dissimilarity between features of the file and the plurality of features of the sample of clean files of the selected model;
  
  comparing the anomaly score against at least one of a lower threshold score, a center threshold score, and an upper threshold score;
  
  classifying the file as anomalous based on the anomaly score; and
  
  remediating the file by the client responsive to the classification of the file.
- View Dependent Claims (9, 10, 11, 12)
- - 9. The non-transitory computer-readable storage medium of claim 8, wherein obtaining the file on the client comprises:
    - applying a filter to a plurality of files to filter out files having characteristics indicative of clean files; and
      
      selecting the file from the plurality of files on the client responsive to the file passing through the filters.
  - 10. The non-transitory computer-readable storage medium of claim 8, wherein generating the anomaly score further comprises:
    - deriving the features of the file;
      
      determining distances between the features of the file and the plurality of features of the sample of clean files; and
      
      generating the anomaly score as a combination of the distances.
  - 11. The non-transitory computer-readable storage medium of claim 8, wherein classifying the file as anomalous comprises:
    - determining that the anomaly score is greater than a threshold score;
      
      responsive to the determination, providing the file to a security server for comparison against a whitelist of known clean files; and
      
      classifying the file as anomalous responsive to receiving an indication from the security server that the file is not on the whitelist.
  - 12. The non-transitory computer-readable storage medium of claim 8, wherein classifying the file as anomalous comprises:
    - determining that the anomaly score is less than a threshold score;
      
      responsive to the determination, providing the file to a security server for comparison against a blacklist of known malware; and
      
      classifying the file as anomalous responsive to receiving an indication from the security server that the file is on the blacklist.

13. A computing system comprising:
- a processor; and
  
  a non-transitory computer-readable storage medium storing instructions for generating information for detecting anomalous files, the instructions when executed by the processor causing the processor to perform steps including;
  
  obtaining a file on a client for classification;
  
  obtaining metadata associated with the file;
  
  determining, based on the metadata, a subclass of the file selected from a plurality of subclasses;
  
  selecting a model of a plurality of models based on the subclass of the file, wherein the selected model characterizes a plurality of features of a sample of clean files that are each associated with the subclass, wherein each of the plurality of models is derived from a training set of clean files belonging to a particular subclass and wherein different ones of the plurality of models are associated with different subclasses;
  
  generating an anomaly score of the file by applying the file to the selected model, the anomaly score indicating a level of dissimilarity between features of the file and the plurality of features of the sample of clean files of the selected model;
  
  comparing the anomaly score against at least one of a lower threshold score, a center threshold score, and an upper threshold score;
  
  classifying the file as anomalous based on the anomaly score; and
  
  remediating the file by the client responsive to the classification of the file.
- View Dependent Claims (14, 15)
- - 14. The system of claim 13, wherein obtaining the file on the client comprises:
    - applying a filter to a plurality of files to filter out files having characteristics indicative of clean files; and
      
      selecting the file from the plurality of files on the client responsive to the file passing through the filters.
  - 15. The system of claim 13, wherein generating the anomaly score further comprises:
    - deriving the features of the file;
      
      determining distances between the features of the file and the plurality of features of the sample of clean files; and
      
      generating the anomaly score as a combination of the distances.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Malwarebytes Corporate Holdco Incorporated
Original Assignee
Malwarebytes Inc.
Inventors
Hartnett, Andrew Thomas, Swanson, Douglas Stuart
Primary Examiner(s)
Perungavoor, Venkat

Application Number

US15/603,337
Time in Patent Office

497 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 21/566   Dynamic detection, i.e. det...

G06F 21/568   eliminating virus, restorin...

G06F 2221/034   Test or assess a computer o...

G06N 20/00   Machine learning

Static anomaly-based detection of malware files

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Static anomaly-based detection of malware files

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links