Systems and methods for file classification
First Claim
1. A computer-implemented method for file classification, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
- identifying, by a computer security system, a cluster of files that co-occur with each other according to a statistical analysis that detects instances of application packages that install multiple files associated with an application on a single machine;
identifying ground truth files to which the computer security system has previously assigned a security score;
determining that a file in the cluster of files shares an item of file metadata with at least one other file in the ground truth files;
assigning a security score to the file in the cluster of files based at least in part on a security score of the other file in the ground truth files that shares the item of file metadata;
assigning an overall security score to the entire cluster of files based at least in part on the security score assigned to the file in the cluster;
checking, prior to determining that the file in the cluster of files shares the item of file metadata with the other file, a field of file metadata that corresponds to the item of file metadata for accuracy in detecting security threats by checking for a threshold level of at least one of false positives and false negatives; and
determining that the field of file metadata passes the checking for accuracy.
6 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method for file classification may include (1) identifying, by a computer security system, a cluster of files that co-occur with each other according to a statistical analysis, (2) identifying ground truth files to which the computer security system has previously assigned a security score, (3) determining that a file in the cluster of files shares an item of file metadata with another file in the ground truth files, (4) assigning a security score to the file in the cluster of files based on a security score of the other file in the ground truth files that shares the item of file metadata, and (5) assigning an overall security score to the entire cluster of files based on the security score assigned to the file in the cluster. Various other methods, systems, and computer-readable media are also disclosed.
19 Citations
20 Claims
-
1. A computer-implemented method for file classification, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
-
identifying, by a computer security system, a cluster of files that co-occur with each other according to a statistical analysis that detects instances of application packages that install multiple files associated with an application on a single machine; identifying ground truth files to which the computer security system has previously assigned a security score; determining that a file in the cluster of files shares an item of file metadata with at least one other file in the ground truth files; assigning a security score to the file in the cluster of files based at least in part on a security score of the other file in the ground truth files that shares the item of file metadata; assigning an overall security score to the entire cluster of files based at least in part on the security score assigned to the file in the cluster; checking, prior to determining that the file in the cluster of files shares the item of file metadata with the other file, a field of file metadata that corresponds to the item of file metadata for accuracy in detecting security threats by checking for a threshold level of at least one of false positives and false negatives; and determining that the field of file metadata passes the checking for accuracy. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for file classification, the system comprising:
-
an identification module, stored in memory, that; identifies a cluster of files that co-occur with each other according to a statistical analysis that detects instances of application packages that install multiple files associated with an application on a single machine; and identifies ground truth files to which the system has previously assigned a security score; a determination module, stored in memory, that determines that a file in the cluster of files shares an item of file metadata with at least one other file in the ground truth files; an assignment module, stored in memory, that; assigns a security score to the file in the cluster of files based at least in part on a security score of the other file in the ground truth files that shares the item of file metadata; and assigns an overall security score to the entire cluster of files based at least in part on the security score assigned to the file in the cluster; wherein the determination module further; checks, prior to determining that the file in the cluster of files shares the item of file metadata with the other file, a field of file metadata that corresponds to the item of file metadata for accuracy in detecting security threats by checking for a threshold level of at least one of false positives and false negatives; and determines that the field of file metadata passes the checking for accuracy; and at least one physical processor configured to execute the identification module, the determination module, and the assignment module. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A non-transitory computer-readable medium comprising one or more computer-readable instructions that, when executed by at least one processor of a computing device, cause the computing device to:
-
identify, by a computer security system, a cluster of files that co-occur with each other according to a statistical analysis that detects instances of application packages that install multiple files associated with an application on a single machine; identify ground truth files to which the computer security system has previously assigned a security score; determine that a file in the cluster of files shares an item of file metadata with at least one other file in the ground truth files; assign a security score to the file in the cluster of files based at least in part on a security score of the other file in the ground truth files that shares the item of file metadata; assign an overall security score to the entire cluster of files based at least in part on the security score assigned to the file in the cluster; check, prior to determining that the file in the cluster of files shares the item of file metadata with the other file, a field of file metadata that corresponds to the item of file metadata for accuracy in detecting security threats by checking for a threshold level of at least one of false positives and false negatives; and determine that the field of file metadata passes the checking for accuracy.
-
Specification