Systems and methods for detecting malware

US 10,007,786 B1
Filed: 11/28/2015
Issued: 06/26/2018
Est. Priority Date: 11/28/2015
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for detecting malware, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:

identifying a behavioral trace of a program, the behavioral trace comprising a sequence of runtime behaviors exhibited by the program;

dividing the behavioral trace to identify a plurality of n-grams within the behavioral trace, each runtime behavior within the sequence of runtime behaviors corresponding to an n-gram token;

analyzing the plurality of n-grams to generate a feature vector of the behavioral trace comprising;

applying, for each given n-gram in the plurality of n-grams, a feature function to the behavioral trace that describes an occurrence characteristic of the given n-gram within the behavioral trace; and

including a result of the feature function in the feature vector; and

classifying the program based at least in part on the feature vector of the behavioral trace to determine whether the program is malicious;

wherein;

the feature vector comprises a plurality of dimensions, each n-gram within the plurality of n-grams corresponding to a dimension within the plurality of dimensions;

the plurality of n-grams map to the plurality of dimensions according to a non-injective surjection; and

including the result of the feature function in the feature vector comprises aggregating a subset of outputs of the feature function derived from a subset of the plurality of n-grams into a value and assigning the value to a dimension within the plurality of dimensions according to the non-injective surjection.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method for detecting malware may include (1) identifying a behavioral trace of a program, the behavioral trace including a sequence of runtime behaviors exhibited by the program, (2) dividing the behavioral trace to identify a plurality of n-grams within the behavioral trace, each runtime behavior within the sequence of runtime behaviors corresponding to an n-gram token, (3) analyzing the plurality of n-grams to generate a feature vector of the behavioral trace, and (4) classifying the program based at least in part on the feature vector of the behavioral trace to determine whether the program is malicious. Various other methods, systems, and computer-readable media are also disclosed.

Citations

20 Claims

1. A computer-implemented method for detecting malware, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
- identifying a behavioral trace of a program, the behavioral trace comprising a sequence of runtime behaviors exhibited by the program;
  
  dividing the behavioral trace to identify a plurality of n-grams within the behavioral trace, each runtime behavior within the sequence of runtime behaviors corresponding to an n-gram token;
  
  analyzing the plurality of n-grams to generate a feature vector of the behavioral trace comprising;
  
  applying, for each given n-gram in the plurality of n-grams, a feature function to the behavioral trace that describes an occurrence characteristic of the given n-gram within the behavioral trace; and
  
  including a result of the feature function in the feature vector; and
  
  classifying the program based at least in part on the feature vector of the behavioral trace to determine whether the program is malicious;
  
  wherein;
  
  the feature vector comprises a plurality of dimensions, each n-gram within the plurality of n-grams corresponding to a dimension within the plurality of dimensions;
  
  the plurality of n-grams map to the plurality of dimensions according to a non-injective surjection; and
  
  including the result of the feature function in the feature vector comprises aggregating a subset of outputs of the feature function derived from a subset of the plurality of n-grams into a value and assigning the value to a dimension within the plurality of dimensions according to the non-injective surjection.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The computer-implemented method of claim 1, wherein the feature function comprises a boolean function that outputs a predetermined boolean output for the given n-gram when the given n-gram was observed within the behavioral trace.
  - 3. The computer-implemented method of claim 1, wherein the feature function comprises a frequency function that outputs a value for the given n-gram that indicates a number of times the given n-gram was observed within the behavioral trace.
  - 4. The computer-implemented method of claim 1, wherein the feature function comprises a density function that outputs a value for the given n-gram that indicates a relative frequency with which the given n-gram was observed within the behavioral trace.
  - 5. The computer-implemented method of claim 1, wherein identifying the plurality of n-grams within the behavioral trace comprises identifying the plurality of n-grams within a substring of the behavioral trace.
  - 6. The computer-implemented method of claim 5, wherein identifying the plurality of n-grams within the substring of the behavioral trace comprises identifying the plurality of n-grams within a prefix of the behavioral trace.
  - 7. The computer-implemented method of claim 5, wherein identifying the plurality of n-grams within the substring of the behavioral trace comprises dividing the behavioral trace into a plurality of fixed-length substrings and identifying the plurality of n-grams within a fixed-length substring within the plurality of fixed-length substrings.
  - 8. The computer-implemented method of claim 1, wherein:
    - generating the feature vector of the behavioral trace comprises generating a plurality of feature vectors of the behavioral trace, the feature vectors within the plurality of feature vectors differing by at least one of;
      
      feature functions applied to n-grams sampled from the behavioral trace to generate respective feature vectors;
      
      subsets of n-grams selected from the behavioral trace to generate respective feature vectors; and
      
      classifying the program based at least in part on the feature vector of the behavioral trace comprises submitting each of the plurality of feature vectors to a machine learning classifier.
  - 9. The computer-implemented method of claim 1, wherein the sequence of runtime behaviors specifies a contextual runtime condition under which at least one runtime behavior was observed.
  - 10. The computer-implemented method of claim 1, further comprising determining the program is malware based on the classification of the program.
  - 11. The computer-implemented method of claim 10, further comprising protecting the computing device from the malware.

12. A system for detecting malware, the system comprising:
- an identification module, stored in memory, that identifies a behavioral trace of a program, the behavioral trace comprising a sequence of runtime behaviors exhibited by the program;
  
  a division module, stored in memory, that divides the behavioral trace to identify a plurality of n-grams within the behavioral trace, each runtime behavior within the sequence of runtime behaviors corresponding to an n-gram token;
  
  an analysis module, stored in memory, that analyzes the plurality of n-grams to generate a feature vector of the behavioral trace comprising;
  
  applying, for each given n-gram in the plurality of n-grams, a feature function to the behavioral trace that describes an occurrence characteristic of the given n-gram within the behavioral trace; and
  
  including a result of the feature function in the feature vector;
  
  wherein;
  
  the feature vector comprises a plurality of dimensions, each n-gram within the plurality of n-grams corresponding to a dimension within the plurality of dimensions;
  
  the plurality of n-grams map to the plurality of dimensions according to a non-injective surjection; and
  
  including the result of the feature function in the feature vector comprises aggregating a subset of outputs of the feature function derived from a subset of the plurality of n-grams into a value and assigning the value to a dimension within the plurality of dimensions according to the non-injective surjection;
  
  a classification module, stored in memory, that classifies the program based at least in part on the feature vector of the behavioral trace to determine whether the program is malicious; and
  
  at least one physical processor configured to execute the identification module, the division module, the analysis module, and the classification module.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The system of claim 12, wherein the feature function comprises a boolean function that outputs a predetermined boolean output for the given n-gram when the given n-gram was observed within the behavioral trace.
  - 14. The system of claim 12, wherein the feature function comprises a frequency function that outputs a value for the given n-gram that indicates a number of times the given n-gram was observed within the behavioral trace.
  - 15. The system of claim 12, wherein the feature function comprises a density function that outputs a value for the given n-gram that indicates a relative frequency with which the given n-gram was observed within the behavioral trace.
  - 16. The system of claim 12, wherein the classification module further determines the program is malware based on the classification of the program.
  - 17. The system of claim 16, wherein the classification module further protects the system from the malware.

18. A non-transitory computer-readable medium comprising one or more computer-readable instructions that, when executed by at least one processor of a computing device, cause the computing device to:
- identify a behavioral trace of a program, the behavioral trace comprising a sequence of runtime behaviors exhibited by the program;
  
  divide the behavioral trace to identify a plurality of n-grams within the behavioral trace, each runtime behavior within the sequence of runtime behaviors corresponding to an n-gram token;
  
  analyze the plurality of n-grams to generate a feature vector of the behavioral trace comprising;
  
  applying, for each given n-gram in the plurality of n-grams, a feature function to the behavioral trace that describes an occurrence characteristic of the given n-gram within the behavioral trace;
  
  including a result of the feature function in the feature vector; and
  
  classifying the program based at least in part on the feature vector of the behavioral trace to determine whether the program is maliciouswherein;
  
  the feature vector comprises a plurality of dimensions, each n-gram within the plurality of n-grams corresponding to a dimension within the plurality of dimensions;
  
  the plurality of n-grams map to the plurality of dimensions according to a non-injective surjection; and
  
  including the result of the feature function in the feature vector comprises aggregating a subset of outputs of the feature function derived from a subset of the plurality of n-grams into a value and assigning the value to a dimension within the plurality of dimensions according to the non-injective surjection.
- View Dependent Claims (19, 20)
- - 19. The non-transitory computer-readable medium of claim 18, wherein the one or more computer-readable instructions further cause the computing device to determine the program is malware based on the classification of the program.
  - 20. The non-transitory computer-readable medium of claim 19, further comprising protecting the computing device from the malware.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Gen Digital Inc.
Original Assignee
Symantec Corporation (NortonLifeLock Inc.)
Inventors
Bhatkar, Sandeep, Parikh, Jugal, Nachenberg, Carey
Primary Examiner(s)
Dada, Beemnet
Assistant Examiner(s)
Gundry, Stephen

Application Number

US14/953,305
Time in Patent Office

941 Days
Field of Search
US Class Current
CPC Class Codes

G06F 21/561   Virus type analysis

G06F 21/566   Dynamic detection, i.e. det...

G06N 20/00   Machine learning

Systems and methods for detecting malware

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for detecting malware

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links