Giorgos Borboudakis, University of Crete, Heraklion, Greece
Ioannis Tsamardinos, University of Crete, Heraklion, Greece, Gnosis Data Analysis (JADBio), Institute of Applied and Computational Mathematics, Foundation for Research and Technology, Hellas, Greece
Read more on SPRINGER: https://link.springer.com/article/10.1007/s10618-020-00731-7
Most feature selection methods identify only a single solution. This is acceptable for predictive purposes, but is not sufficient for knowledge discovery if multiple solutions exist. We propose a strategy to extend a class of greedy methods to efficiently identify multiple solutions, and show under which conditions it identifies all solutions. We also introduce a taxonomy of features that takes the existence of multiple solutions into account. Furthermore, we explore different definitions of statistical equivalence of solutions, as well as methods for testing equivalence. A novel algorithm for compactly representing and visualizing multiple solutions is also introduced. In experiments we show that (a) the proposed algorithm is significantly more computationally efficient than the TIE* algorithm, the only alternative approach with similar theoretical guarantees, while identifying similar solutions to it, and (b) that the identified solutions have similar predictive performance.
The researchers present a novel strategy for extending feature selection algorithms to identify multiple statistically equivalent solutions, and proved under which conditions the algorithm is able to identify all solutions. Furthermore, they extended the taxonomy of features proposed by John et al. (1994) to also take multiplicity into account. They also proposed three definitions of statistical equivalence of solutions, as well as methods for testing them. In experiments, they showed that the proposed algorithm is significantly faster than the TIE* algorithm (Statnikov et al. 2013), the only other method with the same theoretical guarantees, while returning similar solutions. This happens because their algorithm directly takes advantage of the computations performed during the search, while TIE* does not. As presented, the strategy can be used to extend greedy algorithms that consist of a forward and a backward phase. However, similar ideas could be used for a more general class of algorithms, namely methods that search in the space of solutions by adding or removing one or multiple features at each iteration. This would allow the extension of methods like recursive feature elimination, lasso (Tibshirani 1996), and stepwise selection (Kutner et al. 2004; Weisberg 2005) methods, to name a few. A limitation of the experimental evaluation is that the algorithms have only been evaluated on regression and binary classification tasks. A more extensive evaluation, including larger numbers of datasets and other common tasks, such as multiclass classification, survival analysis and time course analysis, would be a valuable future research direction.
#featureselection, #modeling, #predictivemodels
JADBio can meet your needs. Ask one of our experts for an interactive demo.
Stay connected to get our news first!
Sign up with a FREE Basic plan! Be part of a growing community of AutoML enthusiasts
GET STARTED