Method, apparatus and terminal for detecting a malware file
First Claim
1. A method for detecting a malware file, comprising:
- acquiring a file to be inspected;
determining an information entropy vector of the file bydividing the file into a predetermined number of segments;
obtaining an information entropy value for each of the segments; and
setting the number of the segments as a dimension of the information entropy vector, wherein each of the segments corresponds to one direction of the information entropy vector, and the information entropy vector of the file is determined based on the information entropy value of each of the segments; and
inspecting, using a trained inspection model, the determined information entropy vector of the file to ascertain whether the file is a malware file, wherein a file type of the file is identical to a model file type corresponding to the inspection model,wherein the inspection model is obtained by;
acquiring a plurality of files with an identical file type and known security categories as training files, wherein the security categories include malware file categories and non-malware file categories;
labeling the acquired training files with security category labels according to the known security categories;
determining the information entropy vectors of the training files; and
training and outputting the inspection model based on the determined information entropy vectors and the security category labels of the training files,the training and outputting the inspection model comprises;
obtaining a subset of files from the training files as first files;
performing a feature classification to the information entropy vectors of the first files, resulting in a classification outcome; and
obtaining an initial inspection model by a learning operation based on the classification outcome and the security category labels of the first files;
determining if a misjudgment rate of the initial inspection model is below a predetermined threshold value and outputting the initial inspection model as the trained inspection model when the misjudgment rate of the initial inspection model is below a predetermined threshold;
if the misjudgment rate is not below the predetermined threshold value, repeating a step of generating a corrected inspection model by correcting the initial inspection model or a present corrected inspection model until the misjudgment rate of the corrected inspection model is below the predetermined threshold value; and
stopping the repeating, and outputting the corrected inspection model as the trained inspection model when the misjudgment rate of the corrected inspection model is below the predetermined threshold value.
3 Assignments
0 Petitions
Accused Products
Abstract
The present application discloses a method, an apparatus and a terminal for detecting a malware file. One embodiment of the method comprises: obtaining a file to be inspected; determining an entropy vector of the file; and inspecting the entropy vector of the file using a trained inspection model to determine if the file is a malware file, wherein a file type of the file is identical to the file type corresponding to the inspection model. This embodiment extracts the entropy vector of the file and determines if the file is a malware file based on the entropy vector of the file. Therefore, the technical problems existed in the art, such as a low speed, a poor capacity and a low efficiency of detecting and destroying the malware file, are addressed and the efficiency of detecting and destroying the malware file is enhanced.
-
Citations
5 Claims
-
1. A method for detecting a malware file, comprising:
-
acquiring a file to be inspected; determining an information entropy vector of the file by dividing the file into a predetermined number of segments; obtaining an information entropy value for each of the segments; and setting the number of the segments as a dimension of the information entropy vector, wherein each of the segments corresponds to one direction of the information entropy vector, and the information entropy vector of the file is determined based on the information entropy value of each of the segments; and inspecting, using a trained inspection model, the determined information entropy vector of the file to ascertain whether the file is a malware file, wherein a file type of the file is identical to a model file type corresponding to the inspection model, wherein the inspection model is obtained by; acquiring a plurality of files with an identical file type and known security categories as training files, wherein the security categories include malware file categories and non-malware file categories; labeling the acquired training files with security category labels according to the known security categories; determining the information entropy vectors of the training files; and training and outputting the inspection model based on the determined information entropy vectors and the security category labels of the training files, the training and outputting the inspection model comprises; obtaining a subset of files from the training files as first files; performing a feature classification to the information entropy vectors of the first files, resulting in a classification outcome; and obtaining an initial inspection model by a learning operation based on the classification outcome and the security category labels of the first files; determining if a misjudgment rate of the initial inspection model is below a predetermined threshold value and outputting the initial inspection model as the trained inspection model when the misjudgment rate of the initial inspection model is below a predetermined threshold; if the misjudgment rate is not below the predetermined threshold value, repeating a step of generating a corrected inspection model by correcting the initial inspection model or a present corrected inspection model until the misjudgment rate of the corrected inspection model is below the predetermined threshold value; and stopping the repeating, and outputting the corrected inspection model as the trained inspection model when the misjudgment rate of the corrected inspection model is below the predetermined threshold value. - View Dependent Claims (2, 3)
-
-
4. An apparatus for detecting a malware file, the apparatus comprising:
-
a processor; a memory storing computer-readable instructions; wherein, when the computer-readable instructions are executed by the processor, the processor is operable configured to; acquire a file to be inspected; determine an information entropy vector of the file divide the file into a predetermined number of segments; obtain an information entropy value for each of the segments; and set the number of the segments as a dimension of an information entropy vector of the file, wherein each of the segments corresponds to one direction of the information entropy vector, and the information entropy vector of the file is determined based on the information entropy value of each of the segments; and inspect, using a trained inspection model, the determined information entropy vector of the file to ascertain whether the file is a malware file, wherein a file type of the file is identical to a model file type corresponding to the inspection model, wherein the inspection model is obtained by; acquiring a plurality of files with an identical file type and known security categories as training files, wherein the security categories comprise malware file categories and non-malware file categories; labeling the acquired training files with security category labels according to the known security categories; determining the information entropy vectors of the training files; and training and outputting the inspection model based on the determined information entropy vectors and the security category labels of the training files, the training and outputting the inspection model comprises; obtaining a subset of files from the training files as first files; performing a feature classification to the information entropy vectors of the first files, resulting in a classification outcome; and obtaining an initial inspection model by a learning operation based on the classification outcome and the security category labels of the first files; determining if a misjudgment rate of the initial inspection model is below a predetermined threshold value and outputting the initial inspection model as the trained inspection model when the misjudgment rate of the initial inspection model is below a predetermined threshold; if the misjudgment rate is not below the predetermined threshold value, repeating a step of generating a corrected inspection model by correcting the initial inspection model or a present corrected inspection model until the misjudgment rate of the corrected inspection model is below the predetermined threshold value; and stopping the repeating, and outputting the corrected inspection model as the trained inspection model when the misjudgment rate of the corrected inspection model is below the predetermined threshold value.
-
-
5. A non-transitory computer storage medium storing computer-readable instructions, wherein, when the computer-readable instructions are executed by a processor, the processor is operable configured to:
-
obtain a file to be inspected, determine an information entropy vector of the file; and
inspect, using a trained inspection model, the determined information entropy vector of the file to ascertain whether the file is a malware file, wherein a file type of the file is identical to a model file type corresponding to the inspection model;where, in order to determine the information entropy vector of the file, the processor is configured to; divide the file into a predetermined number of segments; obtain an information entropy value for each of the segments; and set the number of the segments as a dimension of the information entropy vector, wherein each of the segments corresponds to one direction of the information entropy vector, and the information entropy vector of the file is determined based on the information entropy value of each of the segments, wherein the inspection model is obtained by; acquiring a plurality of files with an identical file type and known security categories as training files, wherein the security categories include malware file categories and non-malware file categories; labeling the acquired training files with security category labels according to the known security categories; determining the information entropy vectors of the training files; and training and outputting the inspection model based on the determined information entropy vectors and the security category labels of the training files, the training and outputting the inspection model comprises; obtaining a subset of files from the training files as first files; performing a feature classification to the information entropy vectors of the first files, resulting in a classification outcome; and obtaining an initial inspection model by a learning operation based on the classification outcome and the security category labels of the first files; determining if a misjudgment rate of the initial inspection model is below a predetermined threshold value and outputting the initial inspection model as the trained inspection model when the misjudgment rate of the initial inspection model is below a predetermined threshold; if the misjudgment rate is not below the predetermined threshold value, repeating a step of generating a corrected inspection model by correcting the initial inspection model or a present corrected inspection model until the misjudgment rate of the corrected inspection model is below the predetermined threshold value; and stopping the repeating, and outputting the corrected inspection model as the trained inspection model when the misjudgment rate of the corrected inspection model is below the predetermined threshold value.
-
Specification