Recurrent neural networks for malware analysis
First Claim
Patent Images
1. A method comprising:
- receiving or accessing data encapsulating a sample of at least a portion of one or more files;
feeding at least a portion of the received or accessed data as a time-based sequence into a recurrent neural network (RNN) trained using historical data;
extracting, by the RNN, a final hidden state hi in a hidden layer of the RNN in which i is a number of elements of the sample; and
determining, using the RNN and the final hidden state, whether at least a portion of the sample is likely to comprise malicious code.
1 Assignment
0 Petitions
Accused Products
Abstract
Using a recurrent neural network (RNN) that has been trained to a satisfactory level of performance, highly discriminative features can be extracted by running a sample through the RNN, and then extracting a final hidden state hi, where i is the number of instructions of the sample. This resulting feature vector may then be concatenated with the other hand-engineered features, and a larger classifier may then be trained on hand-engineered as well as automatically determined features. Related apparatus, systems, techniques and articles are also described.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving or accessing data encapsulating a sample of at least a portion of one or more files; feeding at least a portion of the received or accessed data as a time-based sequence into a recurrent neural network (RNN) trained using historical data; extracting, by the RNN, a final hidden state hi in a hidden layer of the RNN in which i is a number of elements of the sample; and determining, using the RNN and the final hidden state, whether at least a portion of the sample is likely to comprise malicious code. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system comprising:
-
at least one programmable data processor; and memory storing instructions which, when executed by the at least one programmable data processor, result in operations comprising; receiving or accessing data encapsulating a sample of at least a portion of one or more files; feeding at least a portion of the received or accessed data as a time-based sequence into a recurrent neural network (RNN) trained using historical data; extracting, by the RNN, a final hidden state hi in a hidden layer of the RNN in which i is a number of elements of the sample; and determining, using the RNN and the final hidden state, whether at least a portion of the sample is likely to comprise malicious code.
-
-
20. A non-transitory computer program product storing instructions which, when executed by at least one programmable data processor forming part of at least one computing device, result in operations comprising:
-
receiving or accessing data encapsulating a sample of at least a portion of one or more files; feeding at least a portion of the received or accessed data as a time-based sequence into a recurrent neural network (RNN) trained using historical data; extracting, by the RNN, a final hidden state hi in a hidden layer of the RNN in which i is a number of elements of the sample; and determining, using the RNN and the final hidden state, whether at least a portion of the sample is likely to comprise malicious code.
-
Specification