Malware detection using file names
First Claim
1. A computer-implemented method of determining whether a computer file contains malicious software, comprising:
- identifying a computer file stored on a plurality of different endpoints, the computer file having a plurality of different names on the endpoints;
analyzing the plurality of different names for the computer file to generate a score, the score indicating a confidence that the computer file contains malicious software, wherein the analysis comprises;
determining an amount of dissimilarity among the plurality of different names for the computer file by comparing pairs of different names for the computer file to determine dissimilarity of character strings forming the names in the pairs;
generating the score responsive to the amount of dissimilarity among the plurality of different names for the computer file, wherein a greater amount of dissimilarity correlates with a greater confidence that the computer file contains malicious software; and
weighting the score for age and/or prevalence of the computer file, wherein the age weight for the score is inversely proportional to a length of time that the computer file has been stored on an endpoint and the prevalence weight for the score is inversely proportional to a prevalence of the computer file among the plurality of different endpoints; and
determining whether the computer file contains malicious software responsive at least in part to the score.
2 Assignments
0 Petitions
Accused Products
Abstract
Descriptions of files detected at endpoints are submitted to a security server. The descriptions describe the names of the files and unique identifiers of the files. The security server uses the unique identifiers to identify files having different names at different endpoints. For a given file having multiple names, the names are processed to account for name differences unlikely to have been caused by malware. The processed names for the file are analyzed to determine the amount of dissimilarity among the names. This analysis is used to generate a score indicating a confidence that the computer file contains malicious software, where a greater amount of dissimilarity among the names generally indicates a greater confidence that the computer file contains malicious software. The score is weighted based on file name frequency, the age of the file, and the prevalence of the file. The weighted score is used to determine whether the computer file contains malicious software.
40 Citations
17 Claims
-
1. A computer-implemented method of determining whether a computer file contains malicious software, comprising:
-
identifying a computer file stored on a plurality of different endpoints, the computer file having a plurality of different names on the endpoints; analyzing the plurality of different names for the computer file to generate a score, the score indicating a confidence that the computer file contains malicious software, wherein the analysis comprises; determining an amount of dissimilarity among the plurality of different names for the computer file by comparing pairs of different names for the computer file to determine dissimilarity of character strings forming the names in the pairs; generating the score responsive to the amount of dissimilarity among the plurality of different names for the computer file, wherein a greater amount of dissimilarity correlates with a greater confidence that the computer file contains malicious software; and weighting the score for age and/or prevalence of the computer file, wherein the age weight for the score is inversely proportional to a length of time that the computer file has been stored on an endpoint and the prevalence weight for the score is inversely proportional to a prevalence of the computer file among the plurality of different endpoints; and determining whether the computer file contains malicious software responsive at least in part to the score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer system for determining whether a computer file contains malicious software, comprising:
-
a non-transitory computer-readable storage medium storing executable computer program instructions comprising instructions for; identifying a computer file stored on a plurality of different endpoints, the computer file having a plurality of different names on the endpoints; analyzing the plurality of different names for the computer file to generate a score, the score indicating a confidence that the computer file contains malicious software, wherein the analysis comprises; determining an amount of dissimilarity among the plurality of different names for the computer file by comparing pairs of different names for the computer file to determine dissimilarity of character strings forming the names in the pairs; generating the score responsive to the amount of dissimilarity among the plurality of different names for the computer file, wherein a greater amount of dissimilarity correlates with a greater confidence that the computer file contains malicious software; and weighting the score for age and/or prevalence of the computer file, wherein the age weight for the score is inversely proportional to a length of time that the computer file has been stored on an endpoint and the prevalence weight for the score is inversely proportional to a prevalence of the computer file among the plurality of different endpoints; and determining whether the computer file contains malicious software responsive at least in part to the score; and a processor for executing the computer program instructions. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A non-transitory computer-readable storage medium storing executable computer program instructions for determining whether a computer file contains malicious code, the instructions comprising instructions for:
-
submitting file descriptions describing attributes of computer files detected on an endpoint to a security server, the attributes comprising unique identifiers of the computer files and names of the computer files; receiving, from the security server, an indication of whether a computer file contains malicious software, the indication determined responsive to an analysis of unique identifiers and names of computer files described by file descriptions submitted by a plurality of different endpoints, wherein the analysis comprises; determining an amount of dissimilarity among a plurality of different names for the computer file by comparing pairs of different names for the computer file to determine dissimilarity of character strings forming the names in the pairs; generating a score responsive to the amount of dissimilarity among the plurality of different names for the computer file, wherein a greater amount of dissimilarity correlates with a greater confidence that the computer file contains malicious software; and weighting the score for age and/or prevalence of the computer file, wherein the age weight for the score is inversely proportional to a length of time that the computer file has been on an endpoint of the plurality of different endpoints and the prevalence weight for the score is inversely proportional to a prevalence of the computer file among the plurality of different endpoints, and the indication is determined responsive to the weighted score; and responsive to the received indication indicating that the computer file contains malicious software, remediating the malicious software. - View Dependent Claims (17)
-
Specification