×

Measuring confidence of file clustering and clustering based file classification

  • US 8,214,365 B1
  • Filed: 02/28/2011
  • Issued: 07/03/2012
  • Est. Priority Date: 02/28/2011
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method for quantifying a confidence level in a quality of a cluster of samples, wherein the samples are clustered according to runtime behavior, the method comprising the steps of:

  • determining, by at least one computer, a uniformity of the cluster, the uniformity of the cluster being determined as a function of at least a ratio of a most frequently occurring unique sample label present in the cluster to a total number of unique sample labels present in the cluster;

    assigning, by the at least one computer, a raw confidence value to the cluster, the raw confidence value being a function of the determined uniformity of the cluster;

    calculating, by the at least one computer, a confidence interval weight for the cluster, the confidence interval weight being calculated by using a confidence interval to determine reliability of the determined uniformity of the cluster;

    calculating, by the at least one computer, a trace length weight for the cluster, the trace length weight being calculated as a function of lengths of traces generated by the samples in the cluster;

    calculating, by the at least one computer, an n-gram weight for the cluster, the n-gram weight being calculated as a function of numbers of unique n-grams generated by the samples in the cluster;

    calculating, by the at least one computer, a compactness weight for the cluster, the compactness weight being calculated as a function of similarity of samples in the cluster to a point of reference;

    calculating, by the at least one computer, a cluster weight for the cluster, the cluster weight being calculated as a function of the confidence interval weight, the trace length weight, the n-gram weight and the compactness weight; and

    assigning, by the at least one computer, a cluster confidence measurement to the cluster, the cluster confidence measurement being a function of the cluster weight and the cluster raw confidence value.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×