Decision tree induction that is sensitive to attribute computational complexity
First Claim
1. A computer-implemented method for constructing a decision tree for classifying computer files based on the computational complexities of attributes of the files, comprising:
- creating a plurality of attribute vectors for a plurality of training files, each training file classified as legitimate or malware, and each attribute vector comprising values of a predetermined set of attributes for an associated training file;
determining a computational complexity score for each attribute in the predetermined set of attributes, the computational complexity score measuring a cost associated with determining a value of an associated attribute for a training file;
recursively growing the decision tree based on the plurality of attribute vectors and the computational complexity scores; and
providing the recursively grown decision tree to a client system, the client system adapted to traverse the decision tree to classify a target file at the client system as legitimate or malware.
2 Assignments
0 Petitions
Accused Products
Abstract
A decision tree for classifying computer files is constructed. Computational complexities of a set of candidate attributes are determined. A set of attribute vectors are created for a set of training files with known classification. A node is created to represent the set. A weighted impurity reduction score is calculated for each candidate attribute based on the computational complexity of the attribute. If a stopping criterion is satisfied then the node is set as a leaf node. Otherwise the node is set as a branch node and the attribute with the highest weighted impurity reduction score is selected as the splitting attribute for the branch node. The set of attribute vectors are split into subsets based on their attribute values of the splitting attribute. The above process is repeated for each subset. The tree is then pruned based on the computational complexities of the splitting attributes.
-
Citations
20 Claims
-
1. A computer-implemented method for constructing a decision tree for classifying computer files based on the computational complexities of attributes of the files, comprising:
-
creating a plurality of attribute vectors for a plurality of training files, each training file classified as legitimate or malware, and each attribute vector comprising values of a predetermined set of attributes for an associated training file; determining a computational complexity score for each attribute in the predetermined set of attributes, the computational complexity score measuring a cost associated with determining a value of an associated attribute for a training file; recursively growing the decision tree based on the plurality of attribute vectors and the computational complexity scores; and providing the recursively grown decision tree to a client system, the client system adapted to traverse the decision tree to classify a target file at the client system as legitimate or malware. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer system for constructing a decision tree for classifying computer files that takes into account computational complexities of attributes of the files, comprising:
-
a computer-readable storage medium storing executable computer program code; and a processor for executing the executable computer program code, wherein the computer program code is executable to perform steps comprising; creating a plurality of attribute vectors for a plurality of training files, each training file classified as legitimate or malware, and each attribute vector comprising values of a predetermined set of attributes for an associated training file; determining a computational complexity score for each attribute in the predetermined set of attributes, the computational complexity score measuring a cost associated with determining a value of an associated attribute for a training file; recursively growing the decision tree based on the plurality of attribute vectors and the computational complexity scores; and providing the recursively grown decision tree to a client system, the client system adapted to traverse the decision tree to classify a target file at the client system as legitimate or malware. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-implemented method of detecting malware at a client, comprising:
-
receiving, at the client, a decision tree from a remote security system via a network, the decision tree constructed using steps comprising; creating a plurality of attribute vectors for a plurality of training files, each training file classified as legitimate or malware, and each attribute vector comprising values of a predetermined set of attributes for an associated training file; determining a computational complexity score for each attribute in the predetermined set of attributes, the computational complexity score measuring a cost associated with determining a value of an associated attribute for a training file; and recursively growing a decision tree based on the plurality of attribute vectors and the computational complexity scores, traversal of the grown decision tree indicating whether a target file is legitimate or malware; identifying a target file at the client; and classifying, using the decision tree, the identified target file as legitimate or malware. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification