Measuring confidence of file clustering and clustering based file classification
First Claim
1. A computer implemented method for quantifying a confidence level in a quality of a cluster of samples, wherein the samples are clustered according to runtime behavior, the method comprising the steps of:
- determining, by at least one computer, a uniformity of the cluster, the uniformity of the cluster being determined as a function of at least a ratio of a most frequently occurring unique sample label present in the cluster to a total number of unique sample labels present in the cluster;
assigning, by the at least one computer, a raw confidence value to the cluster, the raw confidence value being a function of the determined uniformity of the cluster;
calculating, by the at least one computer, a confidence interval weight for the cluster, the confidence interval weight being calculated by using a confidence interval to determine reliability of the determined uniformity of the cluster;
calculating, by the at least one computer, a trace length weight for the cluster, the trace length weight being calculated as a function of lengths of traces generated by the samples in the cluster;
calculating, by the at least one computer, an n-gram weight for the cluster, the n-gram weight being calculated as a function of numbers of unique n-grams generated by the samples in the cluster;
calculating, by the at least one computer, a compactness weight for the cluster, the compactness weight being calculated as a function of similarity of samples in the cluster to a point of reference;
calculating, by the at least one computer, a cluster weight for the cluster, the cluster weight being calculated as a function of the confidence interval weight, the trace length weight, the n-gram weight and the compactness weight; and
assigning, by the at least one computer, a cluster confidence measurement to the cluster, the cluster confidence measurement being a function of the cluster weight and the cluster raw confidence value.
2 Assignments
0 Petitions
Accused Products
Abstract
A uniformity of a cluster of samples is determined, and a corresponding raw confidence value is calculated. A confidence interval weight is calculated using a confidence interval to determine reliability of the uniformity. A trace length weight is calculated, as a function of traces of the samples. An n-gram weight is calculated, as a function of numbers of n-grams generated by the samples. A compactness weight is calculated, as a function of the similarity of the samples. A cluster weight is calculated as a function of the four above-described weights. A cluster confidence measurement is calculated as a function of the cluster weight and the raw confidence value. When a new sample is assigned to the cluster, an assignment confidence measurement is calculated, as a function of the cluster'"'"'s confidence measurement and the sample'"'"'s trace length, n-grams and similarity.
19 Citations
20 Claims
-
1. A computer implemented method for quantifying a confidence level in a quality of a cluster of samples, wherein the samples are clustered according to runtime behavior, the method comprising the steps of:
-
determining, by at least one computer, a uniformity of the cluster, the uniformity of the cluster being determined as a function of at least a ratio of a most frequently occurring unique sample label present in the cluster to a total number of unique sample labels present in the cluster; assigning, by the at least one computer, a raw confidence value to the cluster, the raw confidence value being a function of the determined uniformity of the cluster; calculating, by the at least one computer, a confidence interval weight for the cluster, the confidence interval weight being calculated by using a confidence interval to determine reliability of the determined uniformity of the cluster; calculating, by the at least one computer, a trace length weight for the cluster, the trace length weight being calculated as a function of lengths of traces generated by the samples in the cluster; calculating, by the at least one computer, an n-gram weight for the cluster, the n-gram weight being calculated as a function of numbers of unique n-grams generated by the samples in the cluster; calculating, by the at least one computer, a compactness weight for the cluster, the compactness weight being calculated as a function of similarity of samples in the cluster to a point of reference; calculating, by the at least one computer, a cluster weight for the cluster, the cluster weight being calculated as a function of the confidence interval weight, the trace length weight, the n-gram weight and the compactness weight; and assigning, by the at least one computer, a cluster confidence measurement to the cluster, the cluster confidence measurement being a function of the cluster weight and the cluster raw confidence value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. At least one non-transitory computer readable medium storing a computer program product for quantifying a confidence level in a quality of a cluster of samples, wherein the samples are clustered according to runtime behavior, the computer program product comprising:
-
program code for determining a uniformity of the cluster, the uniformity of the cluster being determined as a function of at least a ratio of a most frequently occurring unique sample label present in the cluster to a total number of unique sample labels present in the cluster; program code for assigning a raw confidence value to the cluster, the raw confidence value being a function of the determined uniformity of the cluster; program code for calculating a confidence interval weight for the cluster, the confidence interval weight being calculated by using a confidence interval to determine reliability of the determined uniformity of the cluster; program code for calculating a trace length weight for the cluster, the trace length weight being calculated as a function of lengths of traces generated by the samples in the cluster; program code for calculating an n-gram weight for the cluster, the n-gram weight being calculated as a function of numbers of unique n-grams generated by the samples in the cluster; program code for calculating a compactness weight for the cluster, the compactness weight being calculated as a function of similarity of samples in the cluster to a point of reference; program code for calculating a cluster weight for the cluster, the cluster weight being calculated as a function of the confidence interval weight, the trace length weight, the n-gram weight and the compactness weight; and program code for assigning a cluster confidence measurement to the cluster, the cluster confidence measurement being a function of the cluster weight and the cluster raw confidence value. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A computer system for quantifying a confidence level in a quality of a cluster of samples, wherein the samples are clustered according to runtime behavior, the computer system comprising:
-
a processor; computer memory; a cluster uniformity determining module residing in the system memory, configured for determining a uniformity of the cluster as a function of at least the uniformity of the cluster being determined as a function of at least a ratio of a most frequently occurring unique sample label present in the cluster to a total number of unique sample labels present in the cluster; a raw confidence calculating module residing in the system memory, configured for assigning a raw confidence value to the cluster, the raw confidence value being a function of the determined uniformity of the cluster; a confidence interval weight calculating module residing in the system memory, configured for calculating a confidence interval weight for the cluster, the confidence interval weight being calculated by using a confidence interval to determine reliability of the determined uniformity of the cluster; a trace length weight calculating module residing in the system memory, configured for calculating a trace length weight for the cluster, the trace length weight being calculated as a function of lengths of traces generated by the samples in the cluster; an n-gram weight calculating module residing in the system memory, configured for calculating an n-gram weight for the cluster, the n-gram weight being calculated as a function of numbers of unique n-grams generated by the samples in the cluster; a compactness weight calculating module residing in the system memory, configured for calculating a compactness weight for the cluster, the compactness weight being calculated as a function of similarity of samples in the cluster to a point of reference; a cluster weight calculating module residing in the system memory, configured for calculating a cluster weight for the cluster, the cluster weight being calculated as a function of the confidence interval weight, the trace length weight, the n-gram weight and the compactness weight; and a cluster confidence measurement calculating module residing in the system memory, configured for assigning a cluster confidence measurement to the cluster, the cluster confidence measurement being a function of the cluster weight and the cluster raw confidence value.
-
Specification