System and method to extract and utilize disassembly features to classify software intent
First Claim
Patent Images
1. A method to extract and utilize disassembly features to classify an intent of a software program, the method comprising:
- generating a model based, at least in part, on features associated with at least (i) one or more samples from labeled malicious software, and (ii) one or more samples from labeled benign software extracted from training files, the model to maintain statistics associated with each particular type of sample; and
classifying an unknown sample being a software program in accordance with the model being utilized by a classifier, the classifying of the software program comprisesdisassembling the unknown sample being a software program selectable via a user interface, the disassembling includes parsing the software program, identifying machine code instructions within the parsed software program, and analyzing a structure of the software program by identifying at least one of code blocks, function boundaries, and stack frames, wherein at least one or more of the identified code blocks, function boundaries or stack frames corresponding to at least one feature of the unknown sample;
analyzing the at least one feature by a machine-learning algorithm operating in accordance with the model by comparing the at least one feature to features contained in the model, the machine-learning algorithm being executed by a hardware processor; and
classifying the software program based on a result yielded from the analyzing of the at least one feature.
7 Assignments
0 Petitions
Accused Products
Abstract
A system and method operable to identify malicious software by extracting one or more features disassembled from software suspected to be malicious software and employing one or more of those features in a machine-learning algorithm to classify such software.
-
Citations
21 Claims
-
1. A method to extract and utilize disassembly features to classify an intent of a software program, the method comprising:
-
generating a model based, at least in part, on features associated with at least (i) one or more samples from labeled malicious software, and (ii) one or more samples from labeled benign software extracted from training files, the model to maintain statistics associated with each particular type of sample; and classifying an unknown sample being a software program in accordance with the model being utilized by a classifier, the classifying of the software program comprises disassembling the unknown sample being a software program selectable via a user interface, the disassembling includes parsing the software program, identifying machine code instructions within the parsed software program, and analyzing a structure of the software program by identifying at least one of code blocks, function boundaries, and stack frames, wherein at least one or more of the identified code blocks, function boundaries or stack frames corresponding to at least one feature of the unknown sample; analyzing the at least one feature by a machine-learning algorithm operating in accordance with the model by comparing the at least one feature to features contained in the model, the machine-learning algorithm being executed by a hardware processor; and classifying the software program based on a result yielded from the analyzing of the at least one feature. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system operable to extract and utilize disassembly features to classify software intent of a software program, the system comprising:
-
a model generation tool operable to generate a model based, at least in part, on features associated with at least (i) one or more samples from labeled malicious software, and (ii) one or more samples from labeled benign software extracted from training files, the model to maintain statistics associated with each particular type of sample; a disassembly tool operable to (i) at least partially disassemble an unknown sample being a software program selectable via a user interface, the disassembling includes parsing the software program, identifying machine code instructions within the parsed software program, and analyzing a structure of the software program by at least identifying one or more of code blocks, function boundaries or stack frames, wherein one or more of the identified code blocks, function boundaries or stack frames corresponds to at least one feature of the unknown sample; an extractor operable to extract the at least one feature by at least extracting the statistics associated with the software program, the statistics include at least a count or ratio associated with particular instructions; a processor operable to process the at least one feature using a machine-learning algorithm operating in accordance with the model by comparing the at least one feature to features contained in the model, the machine-learning algorithm being executed by the processor; and a classifier operable to classify, in accordance with the model, the software program based on a result yielded from the processing of the at least one feature. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
-
21. A method to extract and utilize disassembly features to classify an intent of a software program, the method comprising:
-
generating a model based, at least in part, on features associated with at least (i) one or more samples from labeled malicious software, and (ii) one or more samples from labeled benign software extracted from training files; classifying an unknown sample being a software program in accordance with the model being utilized by a classifier, the classifying of the software program comprises disassembling the unknown sample being a software program, the disassembling includes parsing the software program, identifying machine code instructions within the parsed software program, and analyzing a structure of the software program by identifying at least one of code blocks, function boundaries and stack frames, wherein at least one or more of the identified code blocks, function boundaries or stack frames corresponding to at least one feature of the unknown sample; disassembling, at least partially using a disassembly tool, the unknown sample being a software program selectable via a user interface, the disassembling includes parsing the software program, identifying machine code instructions within the parsed software program, and analyzing a structure of the software program by identifying at least one of code blocks, function boundaries and stack frames, wherein at least one or more of the identified code blocks, function boundaries or stack frames corresponding to at least one feature of the unknown sample analyzing the at least one feature by a machine-learning algorithm operating in accordance with the model by comparing the at least one feature to features contained in the model, the machine-learning algorithm being executed by a hardware processor; and classifying the software program based on a result yielded from the analyzing of the at least one feature.
-
Specification