MATRIX FACTORIZATION FOR AUTOMATED MALWARE DETECTION
First Claim
Patent Images
1. A malware detection system comprising:
- at least one processor;
a feature identifier configured to generate a matrix of files and associated machines having a plurality of features associated with the files and machines;
a malware database comprising files of known malware and a plurality of features associated with the known malware;
a comparison engine configured to identify for a file a number of other files that are similar to the file from the matrix of files and the malware database and to score the file based on a closeness of the other files to the file; and
malware classification component configured to identify potential malware based on the score of the file.
3 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein is a system and method for automatically identifying potential malware files or benign files in files that are not known to be malware. Vector distances for select features of the files are compared to vectors both known malware files and benign files. Based on the distance measures a malware score is obtained for the unknown file. If the malware score exceeds a threshold a researcher may be notified of the potential malware, or the file may be automatically classified as malware if the score is significantly high.
-
Citations
20 Claims
-
1. A malware detection system comprising:
-
at least one processor; a feature identifier configured to generate a matrix of files and associated machines having a plurality of features associated with the files and machines; a malware database comprising files of known malware and a plurality of features associated with the known malware; a comparison engine configured to identify for a file a number of other files that are similar to the file from the matrix of files and the malware database and to score the file based on a closeness of the other files to the file; and malware classification component configured to identify potential malware based on the score of the file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method for identifying unknown malware files from a plurality of files comprising:
-
receiving a plurality of files from a plurality of machines each file and each machine having a plurality of features; building a multidimensional matrix associating the plurality of files with the plurality of machines; identifying from the multidimensional matrix a group of features that are most informative in describing a file in the matrix wherein the group of features is a fixed number of features and comprises a subset of the plurality of features; determining a malware score for at least one file in the plurality of files; and determining if the at least one file is potential malware by comparing the malware score for the at least one file against a threshold malware score; wherein the preceding steps are performed by at least one processor. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A computer readable storage device having computer readable instructions that when executed cause at least one computing device to:
-
receive a plurality of files from a plurality of machines each file and each machine having a plurality of features; build a multidimensional matrix associating the plurality of files with the plurality of machines; identify from the multidimensional matrix a group of features that are most informative in describing a machine in the matrix wherein the group of features is a fixed number of features and comprises a subset of the plurality of features; determine a vector distance for files on the at least one machine with corresponding vectors for a plurality of known malware files in a malware database wherein the vector and corresponding vectors are based on the group of features; determine a malware score for at least one machine in the plurality of machines by adding the determined distance; and determine if the at least one machine is compromised by malware by comparing the malware score for the at least one malware against a threshold malware score;
-
Specification