Method and system for classification of software using characteristics and combinations of such characteristics
First Claim
Patent Images
1. A computer program product embodied in a non-transitory computer readable medium that, when executing on one or more computers, performs the steps of:
- identifying a functional code block that performs a particular function within executable code;
transforming the functional code block into two or more generic code representations of its functionality by tokenizing the functional code block into a first generic code representation wherein tokenizing includes converting at least one variable to a predefined generic code uniquely representing the at least one variable, and wherein tokenizing excludes instruction codes and by tokenizing the function code block into a second generic code representation with one or more flags and statistical information;
selecting one of the two or more generic code representations as the generic code representation for further analysis based upon a type of file being analyzed;
comparing the generic code representation with a previously characterized malicious code representation; and
in response to a positive correlation from the comparison, identifying the executable code as containing malicious code.
9 Assignments
0 Petitions
Accused Products
Abstract
In embodiments of the present invention improved capabilities are described for the steps of identifying a functional code block that performs a particular function within executable code; transforming the functional code block into a generic code representation of its functionality by tokenizing, refactoring, or the like, the functional code block; comparing the generic code representation with a previously characterized malicious code representation; and in response to a positive correlation from the comparison, identifying the executable code as containing malicious code.
378 Citations
24 Claims
-
1. A computer program product embodied in a non-transitory computer readable medium that, when executing on one or more computers, performs the steps of:
-
identifying a functional code block that performs a particular function within executable code; transforming the functional code block into two or more generic code representations of its functionality by tokenizing the functional code block into a first generic code representation wherein tokenizing includes converting at least one variable to a predefined generic code uniquely representing the at least one variable, and wherein tokenizing excludes instruction codes and by tokenizing the function code block into a second generic code representation with one or more flags and statistical information; selecting one of the two or more generic code representations as the generic code representation for further analysis based upon a type of file being analyzed; comparing the generic code representation with a previously characterized malicious code representation; and in response to a positive correlation from the comparison, identifying the executable code as containing malicious code. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer program product embodied in a non-transitory computer readable medium that, when executing on one or more computers, performs the steps of:
-
identifying a functional code block that performs a particular function within executable code; transforming the functional code block into two or more generic code representations of its functionality including a first generic code representation obtained by refactoring the functional code block and converting at least one variable to a predefined generic code uniquely representing the at least one variable, and further including a second generic code representation having one or more flags and statistics; selecting one of the two or more generic code representations as the generic code representation for further analysis based upon a type of file being analyzed; comparing the generic code representation with a previously characterized malicious code representation; and in response to a positive correlation from the comparison, identifying the executable code as containing malicious code. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
Specification