Method of constructing data classifiers and classifiers constructed according to the method
First Claim
Patent Images
1. A computer implemented method for constructing a linear classifier of information, said method comprising the steps of:
- a) providing a set of training examples, each of said training examples including an input vector and a classification, said input vector having components and said components being called “
features”
, said classification being positive or negative;
b) constructing a linear program consisting of a set of linear constraints and a linear objective function; and
c) using linear programming (LP) to solve said set of linear constraints and linear objective function to derive a classifier consisting of a weight vector, components of said weight vector being called “
weights”
, and an offset, a response of said classifier to a second input vector consisting of the dot product of said second input vector and the classifiers weight vector, plus the classifier'"'"'s offset.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of constructing data classifiers in cases where the number of features is much larger than the number of training examples and it is critical to avoid overtraining, is particularly useful in recognizing computer objects containing or likely to contain a computer virus. The method provides a novel way of tolerating imperfectly classifiable data, of learning in time polynomial in the size of the input data, and of avoiding overtraining. Training is performed by constructing and solving a linear program (LP) or integer program (IP).
110 Citations
23 Claims
-
1. A computer implemented method for constructing a linear classifier of information, said method comprising the steps of:
-
a) providing a set of training examples, each of said training examples including an input vector and a classification, said input vector having components and said components being called “
features”
, said classification being positive or negative;
b) constructing a linear program consisting of a set of linear constraints and a linear objective function; and
c) using linear programming (LP) to solve said set of linear constraints and linear objective function to derive a classifier consisting of a weight vector, components of said weight vector being called “
weights”
, and an offset, a response of said classifier to a second input vector consisting of the dot product of said second input vector and the classifiers weight vector, plus the classifier'"'"'s offset.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
identifying in the providing step (a) zero or more of said features as excitatory, and identifying zero or more of said features as inhibitory; and
for each excitatory feature j, constraining in the constructing step (b) the corresponding weight, wj, to be at least 0, wj≧
0, and for each inhibitory feature j, constraining the corresponding weight, wj, to be at most 0, wj≦
0.
-
-
4. The method of claim 3, wherein at least one feature is neither identified as excitatory nor inhibitory, the constructing step (b) further comprising the step of duplicating each said feature to provide an identical pair of features, a first one of said pair being identified as excitatory and a second one of said pair being identified as inhibitory, whereby the dimension of each input vector is increased by 1 for each said feature.
-
5. The method of claim 3, wherein substrings known to be present in one or more computer viruses are excitatory features.
-
6. The method of claim 3, wherein substrings known to be present in documents pertaining to a selected subject define excitatory features and substrings known to be present in documents pertaining to a variety of subjects define features classified neither as excitatory nor as inhibitory.
-
7. The method of claim 1, wherein the objective function is to minimize the sum of absolute values of said weights.
-
8. The method of claim 1, wherein the linear constraints constrain each positive example'"'"'s response to be at least 1 and constrain each negative example'"'"'s response to be at most 0.
-
9. The method of claim 8, wherein the linear constraints further constrain each weight to a predetermined range.
-
10. The method of claim 9, wherein said range is −
- 1 to 1.
-
11. The method of claim 1, wherein said constraints include constraining that each response is at least 1 minus a tolerance (xi) for ones of said training examples (i) having a positive said classification and at most −
- 1 plus a tolerance (xi) for ones of said training examples (j) having a positive said classification and wherein each tolerance (xi) is constrained to be at least zero.
-
12. The method of claim 11, wherein said linear objective function minimizes a positive combination of absolute values of said weights plus a positive combination of said tolerances (xi).
-
13. The method of claim 11, wherein said linear objective function minimizes a sum of absolute values of said weights, a “
- false negative”
parameter value p+ times the sum of tolerances for said training examples having a positive classification and a “
false positive”
parameter value p−
times the sum of tolerances for said training examples having a negative classification.
- false negative”
-
14. The method of claim 11, wherein said constraints constrain a positive combination of said tolerances (xi) to be at most a given value and said objective is to minimize a positive combination of the absolute values of said weights.
-
15. The method of claim 11, wherein said constraints constrain a positive combination of the absolute values of said weights to be at most a given value and said objective is to minimize a positive combination of said tolerances (xi).
-
16. The method of claim 1, wherein said constraints include constraints constraining said weights to a discrete domain.
-
17. The method of claim 16, wherein the discrete domain is integers within a preselected range.
-
18. The method of claim 1, wherein the solving step (c) is performed using a linear programming technique.
-
19. The method of claim 1, wherein the solving step (c) is performed using an approximate, heuristic optimization method.
-
20. The method of claim 1, further comprising the step of selecting a set of relevant features, consisting of those features j whose corresponding weights wj have an absolute value at least as large as a preselected value.
-
21. A computer system for constructing a classifier comprising:
-
input means for providing a set of training examples, each of said training examples including an input vector and a classification, components of said input vector being called “
features”
, said classification being positive or negative;
processing means for constructing a linear program consisting of a set of linear constraints and a linear objective function, said processing means solving said set of linear constraints and linear objective function to derive a classifier consisting of a weight vector, components of said vector being called “
weights”
, and an offset;
a response of said classifier to a new input vector consisting of the dot product of said new input vector and the classifier'"'"'s weight vector, plus the classifier'"'"'s offset, the classification of said classifier to said new input vector consisting of “
negative”
if said response is no more than a first threshold, “
positive”
if said response is at least a second threshold, and “
don'"'"'t know”
if said response lies between the first and second thresholds.
-
-
22. A detector of computer viruses implemented on a storage device readable by a computer and embodying a program of instructions executable by the computer, the detector incorporating a classifier constructed by the steps of:
-
providing a set of training examples, each of said training examples including an input vector and a classification, components of said input vector being called “
features”
, said classification being positive or negative, and identifying zero or more of said features as excitatory and identifying zero or more of said features as inhibitory, wherein substrings known to be present in one or more computer viruses are excitatory features;
constructing a linear program consisting of a set of linear constraints and a linear objective function, and for each excitatory feature j, constraining the corresponding weight, wj, to be at least 0, wj≧
0, and for each inhibitory feature j, constraining the corresponding weight, wj, to be at most 0, wj≦
0; and
solving said set of linear constraints and linear objective function to derive a classifier consisting of a weight vector, components of said weight is vector being called “
weights”
, and an offset, a response of said classifier to a new input vector consisting of the dot product of said new input vector and the classifiers weight vector, plus the classifier'"'"'s offset.- View Dependent Claims (23)
measuring the presence of each feature in said input datum to form a new input vector;
adding the classifier'"'"'s offset to the dot product of the classifier'"'"'s weight vector with said new input vector to form a response;
comparing said response with first and second thresholds; and
classifying said input datum as non-viral if said response is no more than the first threshold, classifying said input datum as viral if said response is at least the second threshold, and classifying said input datum as “
don'"'"'t know”
if said response is between the first and second thresholds.
-
Specification