Generation and use of trained file classifiers for malware detection
First Claim
Patent Images
1. A computing device comprising:
- a memory configured to store instructions to execute a trained file classifier; and
a processor configured to execute the instructions from the memory to perform operations comprising;
receiving, via a network from a remote computing device, a feature vector representing a file stored in a memory of the remote computing device, the feature vector including;
a zero-skip n-gram indicating occurrences of adjacent characters in printable characters representing the file,a skip n-gram indicating occurrences of non-adjacent characters in the printable characters representing the file; and
an n-gram indicating occurrences of groups of entropy indicators in a set of entropy indicators derived from file entropy data for the file, each entropy indicator of the set of entropy indicators having a value representing entropy of a corresponding chunk of the file;
generating, by the trained file classifier, classification data associated with the file based on the feature vector, the classification data indicating whether the file includes malware; and
transmitting the classification data to the remote computing device via the network, wherein access to the file or execution of the file at the remote computing device is restricted responsive to the classification data indicating that the file includes malware.
2 Assignments
0 Petitions
Accused Products
Abstract
A method includes training a file classifier from one or more n-gram feature vectors received from a plurality of binary files as input, where the one or more n-gram vectors represent the occurrences of character pairs in printable characters within the file or characters representing the informational entropy sequence of the file. Another method also includes generating, by the file classifier, output including classification data associated with the file based on the one or more n-gram vectors, where the classification data indicates whether the file includes malware.
61 Citations
20 Claims
-
1. A computing device comprising:
-
a memory configured to store instructions to execute a trained file classifier; and a processor configured to execute the instructions from the memory to perform operations comprising; receiving, via a network from a remote computing device, a feature vector representing a file stored in a memory of the remote computing device, the feature vector including; a zero-skip n-gram indicating occurrences of adjacent characters in printable characters representing the file, a skip n-gram indicating occurrences of non-adjacent characters in the printable characters representing the file; and an n-gram indicating occurrences of groups of entropy indicators in a set of entropy indicators derived from file entropy data for the file, each entropy indicator of the set of entropy indicators having a value representing entropy of a corresponding chunk of the file; generating, by the trained file classifier, classification data associated with the file based on the feature vector, the classification data indicating whether the file includes malware; and transmitting the classification data to the remote computing device via the network, wherein access to the file or execution of the file at the remote computing device is restricted responsive to the classification data indicating that the file includes malware. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method comprising:
-
receiving, via a network from a remote computing device, a feature vector representing a file stored in a memory of the remote computing device, the feature vector including; a zero-skip n-gram indicating occurrences of adjacent characters in printable characters representing the file, a skip n-gram indicating occurrences of non-adjacent characters in the printable characters representing the file; and an n-gram indicating occurrences of groups of entropy indicators in a set of entropy indicators derived from file entropy data for the file, each entropy indicator of the set of entropy indicators having a value representing entropy of a corresponding chunk of the file; generating, by a trained file classifier, classification data associated with the file based on the feature vector, the classification data indicating whether the file includes malware; and transmitting the classification data to the remote computing device via the network, wherein access to the file or execution of the file at the remote computing device is restricted responsive to the classification data indicating that the file includes malware. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer-readable storage device storing instructions that, when executed, cause a computer to perform operations comprising:
-
receiving, via a network from a remote computing device, a feature vector representing a file stored in a memory of the remote computing device, the feature vector including; a zero-skip n-gram indicating occurrences of adjacent characters in printable characters representing the file, a skip n-gram indicating occurrences of non-adjacent characters in the printable characters representing the file; and an n-gram indicating occurrences of groups of entropy indicators in a set of entropy indicators derived from file entropy data for the file, each entropy indicator of the set of entropy indicators having a value representing entropy of a corresponding chunk of the file; generating, by a trained file classifier, output including classification data associated with the file based on the feature vector, the classification data indicating whether the file includes malware; and transmitting the classification data to the remote computing device via the network, wherein access to the file or execution of the file at the remote computing device is restricted responsive to the classification data indicating that the file includes malware. - View Dependent Claims (17, 18, 19, 20)
-
Specification