Recurrent neural networks for malware analysis
First Claim
Patent Images
1. A computer-implemented method comprising:
- receiving or accessing executable code comprising instructions;
disassembling the executable code to generate a trace of the instructions;
applying a recurrent neural network (RNN) to the trace to generate a hidden state corresponding to each instruction to form a feature vector;
generating a concatenation of the feature vector with hand-engineered features extracted from the executable code;
determining, using a classifier and the concatenation, a likelihood that the executable code comprises malicious code; and
disallowing, based on the determining, the code from executing;
wherein the classifier is different from the RNN.
1 Assignment
0 Petitions
Accused Products
Abstract
Using a recurrent neural network (RNN) that has been trained to a satisfactory level of performance, highly discriminative features can be extracted by running a sample through the RNN, and then extracting a final hidden state hh where i is the number of instructions of the sample. This resulting feature vector may then be concatenated with the other hand-engineered features, and a larger classifier may then be trained on hand-engineered as well as automatically determined features. Related apparatus, systems, techniques and articles are also described.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving or accessing executable code comprising instructions; disassembling the executable code to generate a trace of the instructions; applying a recurrent neural network (RNN) to the trace to generate a hidden state corresponding to each instruction to form a feature vector; generating a concatenation of the feature vector with hand-engineered features extracted from the executable code; determining, using a classifier and the concatenation, a likelihood that the executable code comprises malicious code; and disallowing, based on the determining, the code from executing; wherein the classifier is different from the RNN. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
one or more data processors having memory storing instructions, which when executed result in operations comprising; receiving or accessing executable code comprising instructions; disassembling the executable code to generate a trace of the instructions; applying a recurrent neural network (RNN) to the trace to generate a hidden state corresponding to each instruction to form a feature vector; generating a concatenation of the feature vector with hand-engineered features extracted from the executable code; determining, using a classifier and the concatenation, a likelihood that the executable code comprises malicious code; and disallowing, based on the determining, the code from executing; wherein the classifier is different from the RNN. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory computer readable storage medium storing one or more programs configured to be executed by one or more data processors, the one or more programs comprising instructions, the instructions comprising:
-
receiving executable code; disassembling the executable code; generating a hidden state for each of a plurality of instructions by applying a recurrent neural network (RNN) to the disassembled executable code to generate a feature vector; and determining, using a classifier, a likelihood that the executable code comprises malicious code based on the feature vector; wherein the classifier is different from the RNN. - View Dependent Claims (18, 19, 20)
-
Specification