Fast feature selection method and system for maximum entropy modeling
First Claim
Patent Images
1. A method to select features for maximum entropy modeling, the method comprising:
- determining gains for candidate features during an initialization stage and for only top-ranked features during each feature selection stage;
ranking the candidate features in an ordered list based on the determined gains;
selecting a top-ranked feature in the ordered list with a highest gain; and
adjusting a model using the selected using the top-ranked feature.
3 Assignments
0 Petitions
Accused Products
Abstract
A method to select features for maximum entropy modeling in which the gains for all candidate features are determined during an initialization stage and gains for only top-ranked features are determined during each feature selection stage. The candidate features are ranked in an ordered list based on the determined gains, a top-ranked feature in the ordered list with a highest gain is selected, and the model is adjusted using the selected using the top-ranked feature.
-
Citations
18 Claims
-
1. A method to select features for maximum entropy modeling, the method comprising:
-
determining gains for candidate features during an initialization stage and for only top-ranked features during each feature selection stage;
ranking the candidate features in an ordered list based on the determined gains;
selecting a top-ranked feature in the ordered list with a highest gain; and
adjusting a model using the selected using the top-ranked feature. - View Dependent Claims (2, 3, 4, 5, 6, 7, 11, 17)
-
-
8. A method to select features for maximum entropy modeling, the method comprising:
-
(a) computing gains of candidate features using a uniform distribution;
(b) ordering the candidate features in an ordered list based on the computed gains;
(c) selecting a top-ranked feature with a highest gain in the ordered list;
(d) adjusting a model using the selected top-ranked feature;
(e) removing the top-ranked feature from the ordered list so that a next-ranked feature in the ordered list becomes the top-ranked feature;
(f) computing a gain of the top-ranked feature using the adjusted model;
(g) comparing the gain of the top-ranked feature with a gain of the next-ranked feature in the ordered list;
(h) if the gain of the top-ranked feature is less than the gain of the next-ranked feature, repositioning the top-ranked feature in the ordered list so that the next-ranked feature becomes the top-ranked feature and an order of the ordered list is maintained and repeating steps (f) through (g); and
(i) repeating steps (c) through (h) until one of a quantity of selected features exceeds a predefined value and a gain of a last-selected feature falls below a predefined value. - View Dependent Claims (9, 10)
-
-
12. A processing arrangement system to perform maximum entropy modeling in which one or more candidate features derived from a corpus of data are incorporated into a model that predicts linguistic behavior, the system comprising:
-
a gain computation arrangement to determine gains for the candidate features during an initialization stage and to determine gains for only top-ranked features during a feature selection stage;
a feature ranking arrangement to rank features based on the determined gain;
a feature selection arrangement to select a feature with a highest gain; and
a model adjustment arrangement to adjust the model using the selected feature. - View Dependent Claims (13, 14, 15, 16)
-
-
18. A storage medium having a set of instructions executable by a processor to perform the following:
-
ordering candidate features based on gains computed on a uniform distribution to form an ordered list of candidate features;
selecting a top-ranked feature with a largest gain to form a model for a next stage;
removing the top-ranked feature from the ordered list of the candidate features;
computing a gain of the top-ranked feature based on a model formed in a previous stage;
comparing the gain of the top-ranked feature with gains of remaining candidate features in the ordered list;
including the top-ranked feature in the model if the gain of the top-ranked feature is greater than the gain of a next-ranked feature in the ordered list;
inserting the top-ranked feature in the ordered list so that the next-ranked feature becomes the top-ranked feature and an order of the ordered list is maintained, if the gain of the top-ranked feature is less than any of the gains of the next-ranked feature in the ordered list;
repeating the steps of computing the gain of the top-ranked feature, comparing the gains of the top-ranked and next-ranked features until the gain of the top-ranked feature exceeds the gains of ordered candidate features; and
terminating the method if one of a quantity of selected features reaches a pre-defined value and a gain of a last feature reaches a pre-defined value.
-
Specification