×

Classifying malware by order of network behavior artifacts

  • US 9,489,514 B2
  • Filed: 10/06/2014
  • Issued: 11/08/2016
  • Est. Priority Date: 10/11/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method of determining whether an executable file is malware by using network behavioral artifacts, the method comprising:

  • identifying a training corpus comprising plurality of benign executable files and a plurality of malware executable files;

    associating, by an electronic hardware processor, each of a plurality of network behavioral artifacts with a respective character set;

    assigning, by an electronic hardware processor, each executable file from the training corpus a respective string of character sets, wherein each string of character sets represents temporally ordered network behavior artifacts of a respective executable file from the training corpus, whereby a plurality of strings of character sets is obtained;

    obtaining, by an electronic hardware processor, for each of the plurality of strings of character sets and for a fixed n>

    1, a respective set of contiguous substrings of length n;

    ordering, by an electronic hardware processor, a union of the respective sets of contiguous substrings of length n, whereby an ordered universe of contiguous substrings of length n is obtained;

    forming, for each executable file from the training corpus and by an electronic hardware processor, a respective feature vector, wherein each respective feature vector comprises a tally list comprising counts of contiguous substrings of length n in the respective set of contiguous n-grams for the respective executable file from the training corpus, whereby a plurality of feature vectors is obtained;

    classifying, by an electronic hardware processor, each respective feature vector of the plurality of feature vectors as associated with either a benign executable file or a malware executable file from the training corpus, whereby a set of classified feature vectors is obtained;

    training a machine learning system with the set of classified feature vectors, wherein the machine learning system comprises an electronic hardware processor;

    identifying an unknown executable file;

    generating, by an electronic hardware processor, a feature vector for the unknown executable file;

    submitting the feature vector for the unknown executable file to the machine learning system;

    obtaining, by an electronic hardware processor, a classification of the unknown executable file as one of likely benign and likely malware; and

    outputting, by an electronic hardware processor, the classification of the unknown executable file.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×