Fuzzy hash of behavioral results
First Claim
1. A computerized method for classifying objects in a malware system, comprising:
- receiving, by a malicious content detection (MCD) system from a client device, an object to be classified;
detecting behaviors of the received object, wherein the behaviors are detected after processing the received object;
generating a fuzzy hash for the received object based on the detected behaviors, the generating of the fuzzy hash comprises (i) obtaining a reduced amount of data associated with the detected behaviors by retaining a portion of the data associated with the detected behaviors that corresponds to one or more operations conducted during processing of the received object, and removing metadata associated with the one or more operations conducted during the processing of the received object, the metadata including at least one or more identifiers of processes called during the processing of the received object, and (ii) performing a hash operation on the reduced amount of data associated with the detected behaviors;
comparing the fuzzy hash for the received object with a fuzzy hash of an object in a preexisting cluster to generate a similarity measure;
associating the received object with the preexisting cluster in response to determining that the similarity measure is above a predefined threshold value;
creating a new cluster for the received object in response to determining that the similarity measure is below the predefined threshold value; and
reporting, by the MCD system, results of either (i) the associating of the received object with the preexisting cluster or (ii) the creating of the new cluster.
5 Assignments
0 Petitions
Accused Products
Abstract
A computerized method is described in which a received object is analyzed by a malicious content detection (MCD) system to determine whether the object is malware or non-malware. The analysis may include the generation of a fuzzy hash based on a collection of behaviors for the received object. The fuzzy hash may be used by the MCD system to determine the similarity of the received object with one or more objects in previously classified/analyzed clusters. Upon detection of a “similar” object, the suspect object may be associated with the cluster and classified based on information attached to the cluster. This similarity matching provides 1) greater flexibility in analyzing potential malware objects, which may share multiple characteristics and behaviors but are also slightly different from previously classified objects and 2) a more efficient technique for classifying/assigning attributes to objects.
717 Citations
26 Claims
-
1. A computerized method for classifying objects in a malware system, comprising:
-
receiving, by a malicious content detection (MCD) system from a client device, an object to be classified; detecting behaviors of the received object, wherein the behaviors are detected after processing the received object; generating a fuzzy hash for the received object based on the detected behaviors, the generating of the fuzzy hash comprises (i) obtaining a reduced amount of data associated with the detected behaviors by retaining a portion of the data associated with the detected behaviors that corresponds to one or more operations conducted during processing of the received object, and removing metadata associated with the one or more operations conducted during the processing of the received object, the metadata including at least one or more identifiers of processes called during the processing of the received object, and (ii) performing a hash operation on the reduced amount of data associated with the detected behaviors; comparing the fuzzy hash for the received object with a fuzzy hash of an object in a preexisting cluster to generate a similarity measure; associating the received object with the preexisting cluster in response to determining that the similarity measure is above a predefined threshold value; creating a new cluster for the received object in response to determining that the similarity measure is below the predefined threshold value; and reporting, by the MCD system, results of either (i) the associating of the received object with the preexisting cluster or (ii) the creating of the new cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory storage medium including instructions that, when executed by one or more hardware processors, performs a plurality of operations, comprising:
-
detecting behaviors of a received object, wherein the behaviors are detected after processing the received object; generating a fuzzy hash for the received object based on the detected behaviors, the generating of the fuzzy hash comprises (i) obtaining a reduced amount of data associated with the detected behaviors by retaining a portion of the data associated with the detected behaviors that corresponds to one or more operations conducted during processing of the received object, and removing metadata associated with the one or more operations conducted during the processing of the received object, the metadata including at least one or more identifiers of processes called during the processing of the received object metadata, and (ii) performing a hash operation on the reduced amount of data associated with the detected behaviors; comparing the fuzzy hash for the received object with a fuzzy hash of an object in a preexisting cluster to generate a similarity measure; associating the received object with the preexisting cluster in response to determining that the similarity measure is above a predefined threshold value; creating a new cluster for the received object in response to determining that the similarity measure is below the predefined threshold value; and reporting results of either (i) the associating of the received object with the preexisting cluster or (ii) the creating of the new cluster. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A system comprising:
-
one or more hardware processors; a memory including one or more software modules that, when executed by the one or more hardware processors; detect behaviors of a received object, wherein the behaviors are detected after processing the received object; generate a fuzzy hash for the received object based on a portion of the detected behaviors, the generating of the fuzzy hash comprises (i) obtaining a reduced amount of data associated with the detected behaviors by retaining a portion of the data associated with the detected behaviors that corresponds to one or more operations conducted during processing of the received object, and removing metadata associated with the one or more operations conducted during the processing of the received object, the metadata including at least one or more identifiers of processes called during the processing of the received object, and (ii) performing a hash operation on the reduced amount of data associated with the detected behaviors; compare the fuzzy hash for the received object with a fuzzy hash of an object in a preexisting cluster to generate a similarity measure; associate the received object with the preexisting cluster in response to determining that the similarity measure is above a predefined threshold value; create a new cluster for the received object in response to determining that the similarity measure is below the predefined threshold value; and report results of either (i) an association of the received object with the preexisting cluster or (ii) a creation of the new cluster. - View Dependent Claims (23, 24, 25, 26)
-
Specification