Systems and methods for automated generation of generic signatures used to detect polymorphic malware
First Claim
1. A computer-implemented method for automated generation of generic signatures used to detect polymorphic malware, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
- clustering a set of polymorphic file samples that share a set of static attributes in common with one another;
computing a distance of the polymorphic file samples from a centroid that represents a reference data point with respect to the set of polymorphic file samples, wherein computing the distance comprises;
computing, based at least in part on certain static attributes of the polymorphic file samples, a plurality of vectors that represent data points with respect to the centroid;
calculating an average of the vectors;
determining that the distance is below a certain threshold by determining that the average of the vectors is within a certain numerical value of the centroid;
upon determining that the distance is below the certain threshold;
identifying, within the set of static attributes shared in common by the polymorphic file samples, a subset of static attributes whose values are identical across all of the polymorphic file samples;
generating a generic file-classification signature from the subset of static attributes.
2 Assignments
0 Petitions
Accused Products
Abstract
The disclosed computer-implemented method for automated generation of generic signatures used to detect polymorphic malware may include (1) clustering a set of polymorphic file samples that share a set of static attributes in common with one another, (2) computing a distance of the polymorphic file samples from a centroid that represents a reference data point with respect to the set of polymorphic file samples, (3) determining that the distance of the polymorphic file samples from the centroid is below a certain threshold, and then upon determining that the distance is below the certain threshold, (4) identifying, within the set of static attributes shared in common by the polymorphic file samples, a subset of static attributes whose values are identical across all of the polymorphic file samples and (5) generating a generic file-classification signature from the subset of static attributes. Various other methods, systems, and computer-readable media are also disclosed.
-
Citations
17 Claims
-
1. A computer-implemented method for automated generation of generic signatures used to detect polymorphic malware, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
-
clustering a set of polymorphic file samples that share a set of static attributes in common with one another; computing a distance of the polymorphic file samples from a centroid that represents a reference data point with respect to the set of polymorphic file samples, wherein computing the distance comprises; computing, based at least in part on certain static attributes of the polymorphic file samples, a plurality of vectors that represent data points with respect to the centroid; calculating an average of the vectors; determining that the distance is below a certain threshold by determining that the average of the vectors is within a certain numerical value of the centroid; upon determining that the distance is below the certain threshold; identifying, within the set of static attributes shared in common by the polymorphic file samples, a subset of static attributes whose values are identical across all of the polymorphic file samples; generating a generic file-classification signature from the subset of static attributes. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for automated generation of generic signatures used to detect polymorphic malware, the system comprising:
-
a clustering module, stored in memory, that clusters a set of polymorphic file samples that share a set of static attributes in common with one another; a computation module, stored in memory, that computes a distance of the polymorphic file samples from a centroid that represents a reference data point with respect to the set of polymorphic file samples, wherein computing the distance comprises; computing, based at least in part on certain static attributes of the polymorphic file samples, a plurality of vectors that represent data points with respect to the centroid; calculating an average of the vectors; a determination module, stored in memory, that determines that the distance is below a certain threshold by determining that the average of the vectors is within a certain numerical value of the centroid; an identification module, stored in memory, that identifies, within the set of static attributes shared in common by the polymorphic file samples, a subset of static attributes whose values are identical across all of the polymorphic file samples; a generation module, stored in memory, that generates a generic file-classification signature from the subset of static attributes; at least one physical processor that executes the clustering module, the computation module, the determination module, the identification module, and the generation module. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable medium comprising one or more computer-executable instructions that, when executed by at least one processor of a computing device, cause the computing device to:
-
cluster a set of polymorphic file samples that share a set of static attributes in common with one another; compute a distance of the polymorphic file samples from a centroid that represents a reference data point with respect to the set of polymorphic file samples, wherein computing the distance comprises; computing, based at least in part on certain static attributes of the polymorphic file samples, a plurality of vectors that represent data points with respect to the centroid; calculating an average of the vectors; determine that the distance is below a certain threshold by determining that the average of the vectors is within a certain numerical value of the centroid; upon determining that the distance is below the certain threshold; identify, within the set of static attributes shared in common by the polymorphic file samples, a subset of static attributes whose values are identical across all of the polymorphic file samples; generate a generic file-classification signature from the subset of static attributes. - View Dependent Claims (14, 15, 16, 17)
-
Specification