Modeling goodware characteristics to reduce false positive malware signatures
First Claim
Patent Images
1. A computer-implemented method of generating a model that specifies a likelihood of observing a characteristic in a set of goodware entities, the method comprising:
- using a computer to perform steps comprising;
determining a set of likelihood values associated with a set of characteristics associated with the set of goodware entities, each likelihood value specifying a likelihood of observing a characteristic of the set of characteristics in the set of goodware entities, wherein determining the set of likelihood values comprises identifying a set of enumeration values indicating numbers of times characteristics of the set of characteristics are observed in the set of goodware entities, and the likelihood values are based on the set of enumeration values;
storing the set of characteristics in association with the set of likelihood values as a model;
generating a set of relative information gain values associated with the characteristics of the set of characteristics, wherein a relative information gain value describes an amount of information an associated characteristic adds to the model;
removing one or more characteristics from the model responsive to the relative information gain values associated with the one or more characteristics to produce a revised model; and
storing the revised model.
2 Assignments
0 Petitions
Accused Products
Abstract
A set of likelihood values associated with a set of characteristics associated with the set of goodware entities is determined. The set of characteristics is stored in association with the set of likelihood values as a model. A set of relative information gain values associated with the characteristics of the set of characteristics is generated. One or more characteristics are removed from the model responsive to the relative information gain values associated with the one or more characteristics to produce a revised model.
-
Citations
17 Claims
-
1. A computer-implemented method of generating a model that specifies a likelihood of observing a characteristic in a set of goodware entities, the method comprising:
using a computer to perform steps comprising; determining a set of likelihood values associated with a set of characteristics associated with the set of goodware entities, each likelihood value specifying a likelihood of observing a characteristic of the set of characteristics in the set of goodware entities, wherein determining the set of likelihood values comprises identifying a set of enumeration values indicating numbers of times characteristics of the set of characteristics are observed in the set of goodware entities, and the likelihood values are based on the set of enumeration values; storing the set of characteristics in association with the set of likelihood values as a model; generating a set of relative information gain values associated with the characteristics of the set of characteristics, wherein a relative information gain value describes an amount of information an associated characteristic adds to the model; removing one or more characteristics from the model responsive to the relative information gain values associated with the one or more characteristics to produce a revised model; and storing the revised model. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A non-transitory computer-readable storage medium comprising program code for generating a model that specifies a likelihood of observing a characteristic in a set of goodware entities, the program code comprising program code for:
-
determining a set of likelihood values associated with a set of characteristics associated with the set of goodware entities, each likelihood value specifying a likelihood of observing a characteristic of the set of characteristics in the set of goodware entities, wherein determining the set of likelihood values comprises identifying a set of enumeration values indicating numbers of times characteristics of the set of characteristics are observed in the set of goodware entities, and the likelihood values are based on the set of enumeration values; storing the set of characteristics in association with the set of likelihood values as a model; generating a set of relative information gain values associated with the characteristics of the set of characteristics, wherein a relative information gain value describes an amount of information an associated characteristic adds to the model; removing one or more characteristics from the model responsive to the relative information gain values associated with the one or more characteristics to produce a revised model; and storing the revised model. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer system for generating a malware signature for detecting a malware entity, the system comprising:
-
a memory; a processor; a goodware model engine stored in the memory and executable by the processor to generate a model that specifies a likelihood of observing a characteristic in a goodware dataset comprising a set of goodware entities; and a malware signature engine stored in the memory and executable by the processor to use the model to determine a likelihood that a characteristic derived from the malware entity is found in the goodware dataset and to generate a malware signature using the characteristic responsive to the likelihood that the characteristic derived from the malware entity is found in the goodware dataset being below a threshold. - View Dependent Claims (16, 17)
-
Specification