Determining malware based on signal tokens

US 9,798,981 B2
Filed: 08/27/2013
Issued: 10/24/2017
Est. Priority Date: 07/31/2013
Status: Expired due to Fees

First Claim

Patent Images

1. A computing device comprising:

a memory and at least one hardware processor to execute a plurality of modules including;

a static code analysis module to generate a set of tokens from an application under test according to obfuscation tolerant rules, wherein each token of the set of tokens is generated upon a hit to one of the obfuscation tolerant rules;

a signal generation module to generate a plurality of signal tokens from the set of tokens using a set of grouping rules, wherein each signal token is generated from a grouping of multiple tokens based on a grouping rule; and

a classification module to perform a Bayes classification to compare the plurality of signal tokens with a signal token database to determine a likelihood of whether malware is included in the application under test.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Example embodiments disclosed herein relate to determining malware. A set of tokens is generated from an application under test, A set of signal tokens is generated from the set of tokens. A likelihood of malware is determined for the application under test based on the signal tokens and a signal token database.

20 Citations

17 Claims

1. A computing device comprising:
- a memory and at least one hardware processor to execute a plurality of modules including;
  
  a static code analysis module to generate a set of tokens from an application under test according to obfuscation tolerant rules, wherein each token of the set of tokens is generated upon a hit to one of the obfuscation tolerant rules;
  
  a signal generation module to generate a plurality of signal tokens from the set of tokens using a set of grouping rules, wherein each signal token is generated from a grouping of multiple tokens based on a grouping rule; and
  
  a classification module to perform a Bayes classification to compare the plurality of signal tokens with a signal token database to determine a likelihood of whether malware is included in the application under test.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computing device of claim 1, wherein the static code analysis further includes at least two of dataflow sources, structural rules, manifest Extensible Markup Language, semantic analysis, and control flow.
  - 3. The computing device of claim 1, wherein the signal token database includes a second plurality of signal tokens generated from a machine learning analysis of a set of known malware application code and a set of clean application code.
  - 4. The computing device of claim 3, wherein the second plurality of signal tokens is generated from groupings of tokens from a second set of tokens, wherein the second set of tokens is generated by a tokenization of the set of known malware application code and the set of known clean application code.
  - 5. The computing device of claim 4, wherein the second plurality of signal tokens is generated using the set of grouping rules.
  - 6. The computing device of claim 1, wherein the grouping rules are based on at least two of a null dereference, resource leak, dead code, path manipulation, query string injection, command injection, resource injection, or denial of service.
  - 7. The computing device of claim 3, wherein the signal tokens of the signal token database are associated with respective malware likeliness values based on the machine learning analysis.

8. A non-transitory machine-readable storage medium storing instructions that, if executed by at least one hardware processor of a device, cause the device to:
- generate a set of tokens from an application under test according to obfuscation tolerant rules, wherein each token of the set of tokens is generated upon a hit to one of the obfuscation tolerant rules;
  
  generate a plurality of signal tokens from the set of tokens using a set of grouping rules, wherein each signal token is generated from a grouping of multiple tokens based on a grouping rule;
  
  use a Bayes classification technique to analyze the plurality of signal tokens with a signal token database including a second plurality of signal tokens to determine a likelihood that the application under test is malware, wherein each of the second plurality of signal tokens is preprocessed to have a likeliness of malware based on a training set; and
  
  determine that the application under test is malware if the likelihood is above a threshold level.
- View Dependent Claims (9, 10, 16, 17)
- - 9. The non-transitory machine-readable storage medium of claim 8, wherein the second plurality of signal tokens is based on groupings of other tokens generated by tokenization of a first set of known malware application code and a second set of known clean application code as the training set.
  - 10. The non-transitory machine-readable storage medium of claim 8, wherein the grouping rules are based on at least two of a null dereference, resource leak, dead code, path manipulation, query string injection, command injection, resource injection, and denial of service.
  - 16. The non-transitory machine-readable storage medium of claim 8, wherein the second plurality of signal tokens is generated from a machine learning analysis of a set of known malware application code and a set of clean application code.
  - 17. The non-transitory machine-readable storage medium of claim 8, wherein the signal tokens of the signal token database are associated with respective malware likeliness values based on the machine learning analysis.

11. A method comprising:
- generating, by a hardware processor, a set of tokens from an application under test according to obfuscation tolerant rules, wherein each token of the set of tokens is generated upon a hit to one of the obfuscation tolerant rules;
  
  generating, by the hardware processor, a plurality of signal tokens from the set of tokens using a set of grouping rules, wherein each signal token is generated from a grouping of multiple tokens based on a grouping rule;
  
  using, by the hardware processor, a Bayesian technique to analyze the plurality of signal tokens with a signal token database including a second plurality of signal tokens to determine a likelihood that the application under test includes malware, wherein each of the second plurality of signal tokens is preprocessed to have a likeliness of malware based on a training set; and
  
  determining, by the hardware processor, that the application under test is malware if the likelihood is above a threshold level.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The method of claim 11, wherein the second plurality of signal tokens is based on groupings of other tokens generated by the tokenization of a first set of known malware application code and a second set of known clean application code.
  - 13. The method of claim 11, wherein the grouping rules are based on at least two of a null dereference, resource leak, dead code, path manipulation, query string injection, command injection, resource injection, and denial of service.
  - 14. The method of claim 11, wherein the second plurality of signal tokens is generated from a machine learning analysis of a set of known malware application code and a set of clean application code.
  - 15. The method of claim 14, wherein the signal tokens of the signal token database are associated with respective malware likeliness values based on the machine learning analysis.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Micro Focus LLC (Open Text Corporation)
Original Assignee
entIT Software LLC (Open Text Corporation)
Inventors
Hsueh, Frank Chijeen, Kamani, Sejal Pranlal
Primary Examiner(s)
Lee, Jason

Application Number

US14/787,852
Publication Number

US 20160094574A1
Time in Patent Office

1,519 Days
Field of Search

726 23, 726 24, 726 25
US Class Current
CPC Class Codes

G06F 21/566   Dynamic detection, i.e. det...

G06N 7/01   Probabilistic graphical mod...

H04L 63/1416   Event detection, e.g. attac...

H04L 63/1433   Vulnerability analysis

Determining malware based on signal tokens

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

20 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Determining malware based on signal tokens

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

20 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links