Method, apparatus and terminal for detecting a malware file

US 10,176,323 B2
Filed: 12/31/2015
Issued: 01/08/2019
Est. Priority Date: 06/30/2015
Status: Active Grant

First Claim

Patent Images

1. A method for detecting a malware file, comprising:

acquiring a file to be inspected;

determining an information entropy vector of the file bydividing the file into a predetermined number of segments;

obtaining an information entropy value for each of the segments; and

setting the number of the segments as a dimension of the information entropy vector, wherein each of the segments corresponds to one direction of the information entropy vector, and the information entropy vector of the file is determined based on the information entropy value of each of the segments; and

inspecting, using a trained inspection model, the determined information entropy vector of the file to ascertain whether the file is a malware file, wherein a file type of the file is identical to a model file type corresponding to the inspection model,wherein the inspection model is obtained by;

acquiring a plurality of files with an identical file type and known security categories as training files, wherein the security categories include malware file categories and non-malware file categories;

labeling the acquired training files with security category labels according to the known security categories;

determining the information entropy vectors of the training files; and

training and outputting the inspection model based on the determined information entropy vectors and the security category labels of the training files,the training and outputting the inspection model comprises;

obtaining a subset of files from the training files as first files;

performing a feature classification to the information entropy vectors of the first files, resulting in a classification outcome; and

obtaining an initial inspection model by a learning operation based on the classification outcome and the security category labels of the first files;

determining if a misjudgment rate of the initial inspection model is below a predetermined threshold value and outputting the initial inspection model as the trained inspection model when the misjudgment rate of the initial inspection model is below a predetermined threshold;

if the misjudgment rate is not below the predetermined threshold value, repeating a step of generating a corrected inspection model by correcting the initial inspection model or a present corrected inspection model until the misjudgment rate of the corrected inspection model is below the predetermined threshold value; and

stopping the repeating, and outputting the corrected inspection model as the trained inspection model when the misjudgment rate of the corrected inspection model is below the predetermined threshold value.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present application discloses a method, an apparatus and a terminal for detecting a malware file. One embodiment of the method comprises: obtaining a file to be inspected; determining an entropy vector of the file; and inspecting the entropy vector of the file using a trained inspection model to determine if the file is a malware file, wherein a file type of the file is identical to the file type corresponding to the inspection model. This embodiment extracts the entropy vector of the file and determines if the file is a malware file based on the entropy vector of the file. Therefore, the technical problems existed in the art, such as a low speed, a poor capacity and a low efficiency of detecting and destroying the malware file, are addressed and the efficiency of detecting and destroying the malware file is enhanced.

Citations

5 Claims

1. A method for detecting a malware file, comprising:
- acquiring a file to be inspected;
  
  determining an information entropy vector of the file bydividing the file into a predetermined number of segments;
  
  obtaining an information entropy value for each of the segments; and
  
  setting the number of the segments as a dimension of the information entropy vector, wherein each of the segments corresponds to one direction of the information entropy vector, and the information entropy vector of the file is determined based on the information entropy value of each of the segments; and
  
  inspecting, using a trained inspection model, the determined information entropy vector of the file to ascertain whether the file is a malware file, wherein a file type of the file is identical to a model file type corresponding to the inspection model,wherein the inspection model is obtained by;
  
  acquiring a plurality of files with an identical file type and known security categories as training files, wherein the security categories include malware file categories and non-malware file categories;
  
  labeling the acquired training files with security category labels according to the known security categories;
  
  determining the information entropy vectors of the training files; and
  
  training and outputting the inspection model based on the determined information entropy vectors and the security category labels of the training files,the training and outputting the inspection model comprises;
  
  obtaining a subset of files from the training files as first files;
  
  performing a feature classification to the information entropy vectors of the first files, resulting in a classification outcome; and
  
  obtaining an initial inspection model by a learning operation based on the classification outcome and the security category labels of the first files;
  
  determining if a misjudgment rate of the initial inspection model is below a predetermined threshold value and outputting the initial inspection model as the trained inspection model when the misjudgment rate of the initial inspection model is below a predetermined threshold;
  
  if the misjudgment rate is not below the predetermined threshold value, repeating a step of generating a corrected inspection model by correcting the initial inspection model or a present corrected inspection model until the misjudgment rate of the corrected inspection model is below the predetermined threshold value; and
  
  stopping the repeating, and outputting the corrected inspection model as the trained inspection model when the misjudgment rate of the corrected inspection model is below the predetermined threshold value.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, wherein the determining if the misjudgment rate of the inspection model is below the predetermined threshold value comprises:
    - obtaining a second subset of files from the training files as second files;
      
      inspecting information entropy vectors of the second files using the inspection model to be tested;
      
      determining the misjudgment rate based on the inspected information entropy vectors and the security category labels of the second files; and
      
      comparing the determined misjudgment rate with the predetermined threshold value to determine if the misjudgment rate is below the predetermined threshold value,wherein the second files and the first files are mutually exclusive.
  - 3. The method of claim 2, wherein the correcting the initial inspection model or a present corrected inspection model comprises at least one of:
    - increasing a number of the first files and obtaining the corrected inspection model by a further learning operation; and
      
      adjusting a dimension of the information entropy vectors and obtaining the corrected inspection model by a further learning operation.

4. An apparatus for detecting a malware file, the apparatus comprising:
- a processor;
  
  a memory storing computer-readable instructions;
  
  wherein, when the computer-readable instructions are executed by the processor, the processor is operable configured to;
  
  acquire a file to be inspected;
  
  determine an information entropy vector of the filedivide the file into a predetermined number of segments;
  
  obtain an information entropy value for each of the segments; and
  
  set the number of the segments as a dimension of an information entropy vector of the file, wherein each of the segments corresponds to one direction of the information entropy vector, and the information entropy vector of the file is determined based on the information entropy value of each of the segments; and
  
  inspect, using a trained inspection model, the determined information entropy vector of the file to ascertain whether the file is a malware file, wherein a file type of the file is identical to a model file type corresponding to the inspection model,wherein the inspection model is obtained by;
  
  acquiring a plurality of files with an identical file type and known security categories as training files, wherein the security categories comprise malware file categories and non-malware file categories;
  
  labeling the acquired training files with security category labels according to the known security categories;
  
  determining the information entropy vectors of the training files; and
  
  training and outputting the inspection model based on the determined information entropy vectors and the security category labels of the training files,the training and outputting the inspection model comprises;
  
  obtaining a subset of files from the training files as first files;
  
  performing a feature classification to the information entropy vectors of the first files, resulting in a classification outcome; and
  
  obtaining an initial inspection model by a learning operation based on the classification outcome and the security category labels of the first files;
  
  determining if a misjudgment rate of the initial inspection model is below a predetermined threshold value and outputting the initial inspection model as the trained inspection model when the misjudgment rate of the initial inspection model is below a predetermined threshold;
  
  if the misjudgment rate is not below the predetermined threshold value, repeating a step of generating a corrected inspection model by correcting the initial inspection model or a present corrected inspection model until the misjudgment rate of the corrected inspection model is below the predetermined threshold value; and
  
  stopping the repeating, and outputting the corrected inspection model as the trained inspection model when the misjudgment rate of the corrected inspection model is below the predetermined threshold value.

5. A non-transitory computer storage medium storing computer-readable instructions, wherein, when the computer-readable instructions are executed by a processor, the processor is operable configured to:
- obtain a file to be inspected, determine an information entropy vector of the file; and
  
  inspect, using a trained inspection model, the determined information entropy vector of the file to ascertain whether the file is a malware file, wherein a file type of the file is identical to a model file type corresponding to the inspection model;
  
  where, in order to determine the information entropy vector of the file, the processor is configured to;
  
  divide the file into a predetermined number of segments;
  
  obtain an information entropy value for each of the segments; and
  
  set the number of the segments as a dimension of the information entropy vector, wherein each of the segments corresponds to one direction of the information entropy vector, and the information entropy vector of the file is determined based on the information entropy value of each of the segments,wherein the inspection model is obtained by;
  
  acquiring a plurality of files with an identical file type and known security categories as training files, wherein the security categories include malware file categories and non-malware file categories;
  
  labeling the acquired training files with security category labels according to the known security categories;
  
  determining the information entropy vectors of the training files; and
  
  training and outputting the inspection model based on the determined information entropy vectors and the security category labels of the training files,the training and outputting the inspection model comprises;
  
  obtaining a subset of files from the training files as first files;
  
  performing a feature classification to the information entropy vectors of the first files, resulting in a classification outcome; and
  
  obtaining an initial inspection model by a learning operation based on the classification outcome and the security category labels of the first files;
  
  determining if a misjudgment rate of the initial inspection model is below a predetermined threshold value and outputting the initial inspection model as the trained inspection model when the misjudgment rate of the initial inspection model is below a predetermined threshold;
  
  if the misjudgment rate is not below the predetermined threshold value, repeating a step of generating a corrected inspection model by correcting the initial inspection model or a present corrected inspection model until the misjudgment rate of the corrected inspection model is below the predetermined threshold value; and
  
  stopping the repeating, and outputting the corrected inspection model as the trained inspection model when the misjudgment rate of the corrected inspection model is below the predetermined threshold value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Beijing Baidu Netcom Science and Technology Company Limited (Baidu Incorporated)
Original Assignee
Iyuntian Co., Ltd.
Inventors
Zhang, Zhuang, Zhao, Changkun, Cao, Liang, Dong, Zhiqiang
Primary Examiner(s)
Pham, Luu T
Assistant Examiner(s)
Malinowski, Walter J

Application Number

US14/985,944
Publication Number

US 20170004306A1
Time in Patent Office

1,104 Days
Field of Search

726 22- 24
US Class Current
CPC Class Codes

G06F 21/56   Computer malware detection ...

G06F 21/562   Static detection

G06N 20/00   Machine learning

G06N 99/00   Subject matter not provided...

G10L 2019/0014   Selection criteria for dist...

H04L 63/1416   Event detection, e.g. attac...

Method, apparatus and terminal for detecting a malware file

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

5 Claims

Specification

Solutions

Use Cases

Quick Links

Method, apparatus and terminal for detecting a malware file

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

5 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links