CASE STUDY

Feature Selection Algorithms

Extending Greedy Feature Selection Algorithms to Multiple Solutions

Giorgos Borboudakis, University of Crete, Heraklion, Greece
Ioannis Tsamardinos, University of Crete, Heraklion, Greece, Gnosis Data Analysis (JADBio), Institute of Applied and Computational Mathematics, Foundation for Research and Technology, Hellas, Greece

Abstract

Most feature selection methods identify only a single solution. This is acceptable for predictive purposes, but is not sufficient for knowledge discovery if multiple solutions exist. We propose a strategy to extend a class of greedy methods to efficiently identify multiple solutions, and show under which conditions it identifies all solutions. We also introduce a taxonomy of features that takes the existence of multiple solutions into account. Furthermore, we explore different definitions of statistical equivalence of solutions, as well as methods for testing equivalence. A novel algorithm for compactly representing and visualizing multiple solutions is also introduced. In experiments we show that (a) the proposed algorithm is significantly more computationally efficient than the TIE* algorithm, the only alternative approach with similar theoretical guarantees, while identifying similar solutions to it, and (b) that the identified solutions have similar predictive performance.

Results

The researchers present a novel strategy for extending feature selection algorithms to identify multiple statistically equivalent solutions, and proved under which conditions the algorithm is able to identify all solutions. Furthermore, they extended the taxonomy of features proposed by John et al. (1994) to also take multiplicity into account. They also proposed three definitions of statistical equivalence of solutions, as well as methods for testing them. In experiments, they showed that the proposed algorithm is significantly faster than the TIE* algorithm (Statnikov et al. 2013), the only other method with the same theoretical guarantees, while returning similar solutions. This happens because their algorithm directly takes advantage of the computations performed during the search, while TIE* does not. As presented, the strategy can be used to extend greedy algorithms that consist of a forward and a backward phase. However, similar ideas could be used for a more general class of algorithms, namely methods that search in the space of solutions by adding or removing one or multiple features at each iteration. This would allow the extension of methods like recursive feature elimination, lasso (Tibshirani 1996), and stepwise selection (Kutner et al. 2004; Weisberg 2005) methods, to name a few. A limitation of the experimental evaluation is that the algorithms have only been evaluated on regression and binary classification tasks. A more extensive evaluation, including larger numbers of datasets and other common tasks, such as multiclass classification, survival analysis and time course analysis, would be a valuable future research direction.

RESEARCH PAPER

#featureselection, #modeling, #predictivemodels

OTHER

Do you have questions?

JADBio can meet your needs. Ask one of our experts for an interactive demo.

Stay connected to get our news first!

REQUEST A DEMO

STAY IN TOUCH

Join the JADai Community!

I consent to the use of following cookies:

Necessary

Marketing

Analytics

Preferences

Unclassified

Cookie Declaration About Cookies

Necessary (3) Marketing (0) Analytics (3) Preferences (0) Unclassified (9)

Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.

Name	Domain	Purpose	Expiry	Type
wpl_user_preference	jadbio.com	WP GDPR Cookie Consent Preferences	1 year	HTTP
__stripe_mid	app.jadbio.com	For processing payment and to aid in fraud detection.	1 year	HTTP
__stripe_sid	app.jadbio.com	Stripe Cookie to process payments	Session	HTTP

Name	Domain	Purpose	Expiry	Type
sp_t	spotify.com	---	1 year	---
sp_landing	spotify.com	---	1 days	---
muxData	open.spotify.com	---	20 years	---
_gcl_au	jadbio.com	---	3 months	---
_gat_UA-150261121-1	jadbio.com	---	Session	---
test_cookie	doubleclick.net	A generic test cookie set by a wide range of web platforms.	Session	HTTP
_lfa	jadbio.com	---	2 years	---
drift_campaign_refresh	jadbio.com	---	Session	---
m	m.stripe.com	---	2 years	---

CASE STUDY

Feature Selection Algorithms

Extending Greedy Feature Selection Algorithms to Multiple Solutions

Abstract

Results

OTHER

CASE STUDIES

Do you have questions?

Join the JADai Community!

US

GREECE

QUICK LINKS

FOLLOW US

CONTACT

LEGAL

Name	Domain	Purpose	Expiry	Type
_ga	jadbio.com	Google Universal Analytics long-time unique user tracking identifier.	2 years	HTTP
_gid	jadbio.com	Google Universal Analytics short-time unique user tracking identifier.	1 days	HTTP
IDE	doubleclick.net	Google advertising cookie used for user tracking and ad targeting purposes.	2 years	HTTP

CASE STUDY

Feature Selection Algorithms

Extending Greedy Feature Selection Algorithms to Multiple Solutions

Abstract

Results

OTHER

CASE STUDIES

Do you have questions?

Join the JADai Community!

Sign up with aFREE Basic plan!

US

GREECE

QUICK LINKS

FOLLOW US

CONTACT

LEGAL

Sign up with a
FREE Basic plan!