Discriminative feature selection for data sequences
First Claim
1. A discriminative feature selection method for selecting a set of features from training data comprising a plurality of data sequences, said data sequences being generated from at least two data sources, and wherein each data sequence comprises a sequence of data symbols from an alphabet, said method comprising:
- building a suffix tree from said training data, said suffix tree comprising suffixes of said data sequences having an empirical probability of occurrence from at least one of said sources greater than a first predetermined threshold; and
pruning from said suffix tree all suffixes for which there exists in said suffix tree a shorter suffix having equivalent predictive capability for all of said data sources.
1 Assignment
0 Petitions
Accused Products
Abstract
A discriminative feature selection method for selecting a set of features from a set of training data sequences is described. The training data sequences are generated by at least two data sources, and each data sequence consists of a sequence of data symbols taken from an alphabet. The method is performed by first building a suffix tree from the training data. The suffix tree contains only suffixes of the data sequences having an empirical probability of occurrence greater than a first predetermined threshold, from at least one of the sources. Next the suffix tree is pruned of all suffixes for which there exists in the suffix tree a shorter suffix having equivalent predictive capability, for all of the data sources.
15 Citations
15 Claims
-
1. A discriminative feature selection method for selecting a set of features from training data comprising a plurality of data sequences, said data sequences being generated from at least two data sources, and wherein each data sequence comprises a sequence of data symbols from an alphabet, said method comprising:
-
building a suffix tree from said training data, said suffix tree comprising suffixes of said data sequences having an empirical probability of occurrence from at least one of said sources greater than a first predetermined threshold; and
pruning from said suffix tree all suffixes for which there exists in said suffix tree a shorter suffix having equivalent predictive capability for all of said data sources. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A discriminative feature selector, for selecting a set of features from training data comprising a plurality of data sequences, said data sequences being generated from at least two data sources, and wherein each data sequence comprises a sequence of data symbols from an alphabet, the feature selector comprising:
-
a tree generator for building a suffix tree from said training data, said suffix tree comprising suffixes of said data sequences having a probability of occurrence from at least one of said sources greater than a first predetermined threshold; and
a pruner for pruning from said suffix tree all suffixes for which there exists in said suffix tree a shorter suffix having equivalent predictive capability. - View Dependent Claims (11, 12, 13, 14, 15)
-
Specification