AutoML contributing to the diagnostic/predictive models for COVID-19

COVID-19 Research

While the battle against the SARS-CoV-2 continues, we’re contributing to the research community. We’re putting our JADBio AutoML in the service of biologists, virologists and anyone who needs to discover knowledge fast. Let us know if you’re also working on SARS-CoV-2 research to include you into our Research Licensing Program today!

CASE STUDY

Automated Machine Learning in healthcare and medical diagnosis: COVID-19

Fig.1

A tremendous amount of scientific effort is currently in place to bring down the rapidly evolving coronavirus pandemic. Although large volumes of data are being collected every day, there is still huge debate on the optimal predictive models for patient stratification; the host-virus molecular footprints for drug-treatment; and the successful management of the disease. Here, we investigated the performance of automated machine learning through Just Add Data Bio; a tool specifically designed for low-sample high-dimensional biomedical datasets to analyze publicly available COVID-19 datasets. In a fast automated way and with minimal human effort, multiple new biosignatures were built with reduced feature number and high predictive performance that remained high upon validation. The models emerged are readily available for translating into COVID-19 clinically relevant cost-effective assays.

Authors: Georgios Papoutsoglou, Makrina Karaglani, Vincenzo Lagani, Naomi Thomson, Oluf Dimitri Røe, Ioannis Tsamardinos, Ekaterini Chatzaki

Fig. 1 Graphical summary of a COVID-19 dataset re-analyzed in this study (Mick et al., 2020). The task was to compare COVID-19 patients to those with another or no acute respiratory infection (ARI). To validate the predictive performance of AutoML, we performed stratified subsampling on the original data: 70% of the samples were assigned for model training and 30% for validation.

View JADBio Analysis

	Analysis Methodology	FS option (JADBio)	FS algorithm	Modeling algorithm	#conf. tried	#models trained	#exec. time	training estimate	validation estimate	#features selected	Link-to-results to the JADBio platform
subsampled	Mick et al.		lasso	random forest	5	25		0.957 (0.9 - 1)	0.944	26
	JADBio	non-aggressive	lasso	random forest	3017	60340	26min	0.937 [0.883 - 0.979]	0.943	24	link
	...	aggressive	SES	random forest	1393	41790	24min	0.918 [0.863 - 0.959]	0.923	25	link
original	Mick et al.		lasso	random forest	5	25		0.98 (0.951 - 1)	-	26
	JADBio	non-aggressive	lasso	random forest	3017	60340	41min	0.948 [0.908 - 0.979]	-	49	link
	...	aggressive	SES	random forest	1393	27860	22min	0.914 [0.865 - 0.955]	-	8 (2 equiv.)	link

Predictive Performance Estimates COVID-19 - JADBio AutoML

Fig. 2 Predictive performance estimates in terms of AUC reported in the original dataset publication, by JADBio on the full set of available data (i.e. no samples are lost to estimation) and by JADBio on training and validation sets (subsampled). Numbers in parentheses denote the range of the estimate while the numbers in brackets the 95% confidence intervals. The equivalences denote the number of equivalent signatures found by JADBio, e.g., “8 (2 equiv.)” means that JADBio discovered 2 equivalent signatures each containing 8 biomarkers. Each link to the JADBio platform leads to a report with the complete list of AutoML results. JADBio does not overestimate when there are no samples held out for estimation; confirms the predicted performance obtained in the original publication and discovers novel signatures.

DISCUSSION

Can AutoML improve the diagnostic/predictive models for COVID-19?

In this study, we applied AutoML in order to obtain accurate diagnostic/predictive models for COVID-19, using available archived datasets. We asked; could we improve on the predictive power of the models? Can we reduce the number of measurements required without sacrificing performance to develop a cost-effective laboratory test? Can we obtain more accurate training estimates that better reflect the performance anticipated in a real life setting? Most importantly, can AutoML improve on these aspects in a fully automated mode?
Using autoML, we have affirmatively answered all these research questions. That is, our approach was on par or better than the published results or the ones obtained by running previously used code and methodology on our training sets. Quite importantly, the respective predictive performance estimates accurately reflect the performance obtained on the validation sets, so we argue that there is no need to lose samples to estimation. JADBio internally handles estimation techniques, so that the user does not have to worry about this: it performs cross-validation, repeats the cross-validation with different fold partitions for low sample size to reduce variance of estimation, stratifies the partitioning to folds of cross-validation to reduce the variance of estimation and handle imbalanced data, corrects performance estimate for the “winner’s curse” and trying multiple algorithms using the bootstrap bias correction for CV, and includes all steps of the analysis (e.g., feature selection) within the cross-validation that leads to overestimation.
Thus, we advocate the use of all data for training with JADBio. Of course, this claim comes with an important disclaimer: JADBio’s theoretical guarantees of out-of-sample performance estimate hold only when the model is applied on the same data distribution. If the models are applied in a clinical setting where measurements have batch effects, the population characteristics are different, or there are other systematic differences in the data, an external validation set from that operational environment is clearly required. An additional advantage of this AutoML approach is that it is able to work in two modes; with and without aggressive feature selection. The latter may give away some predictive power to produce models with multiple (in case of biological redundancy) equivalent biosignatures of selective predictive features providing choices to the designers of diagnostic assays. Accordingly, we were able to deliver several highly diagnostic/prognostic biosignatures of minimal feature size from different types of COVID-19 data.

REFERENCES

Mick, E., Kamm, J., Pisco, A.O. et al. Upper airway gene expression reveals suppressed immune responses to SARS-CoV-2 compared with other respiratory viruses. Nat Commun 11, 5854 (2020). https://doi.org/10.1038/s41467-020-19587-y

#COVID-19 #autoML #MachineLearning, #SARS-CoV-2, #predictivemodels

JADBio AutoML for effortless machine learning models

Who is JADBio AutoML for?

JADBio stands for Just Add Data and aims to make machine learning accessible to all regardless of expertise or programming skills. Whether you’re a bioinformatician, a data scientist, or a non-expert in data science but interested in getting the most out of your data JADBio’s robust AutoML automates the machine learning process, making it easy and affordable to discover knowledge, while reducing time and effort. Focus on what matters, your data insights.

GET STARTED

See JADBio in Action

Cancer Cell

Efficient prediction of TCGA cancer subtypes using compact multi-omic signatures for personalized medicine

Most methods for discovering cancer subtypes cannot label new samples from other studies or trials. This work overcomes that limitation by using five machine learning approaches on multi-omic data from ...

Read Case Study >

Hindawi - CIN

Feature Signature Discovery for Autism Detection: An Automated Machine Learning Based Feature Ranking Framework

This research work fuses the competence of AutoML and computational intelligence to discover highly predictive features for autism that would enable possible early detection of the disorder.

Read Case Study >

MDPI - cells

A Blood-Based Molecular Clock for Biological Age Estimation

Extensive efforts have been made to identify biomarkers of biological age. DNA methylation levels of ELOVL fatty acid elongase 2 (ELOVL2) and the signal joint T-cell receptor rearrangement excision circles ...

Read Case Study >

Crisis

Toward Automatic Risk Assessment to Support Suicide Prevention

Suicide has been considered an important public health issue for years and is one of the main causes of death worldwide. Suicide risk has proven extremely difficult to assess for ...

Read Case Study >

NATURE

A machine learning approach utilizing DNA methylation as an accurate classifier of COVID-19 disease severity

There is a need to identify at-risk individuals early that would benefit from timely medical interventions for COVID-19 disease. DNA methylation provides an opportunity to identify an epigenetic signature of ...

Read Case Study >

Biomarkers discovered in serum months to years before non-small cell lung cancer

SPRINGER

Mass Spectrometry Proteomics analysis discovers biomarkers in serum months to years before non-small cell lung cancer: The HUNT study

Molecular gene-expression datasets consist of samples with tens of thousands of measured quantities. A novel algorithm for dimensionality reduction called Pathway Activity Score Learning (PASL) is presented. The major novelty ...

Read Case Study >

Join the JADai Community!

Name	Domain	Purpose	Expiry	Type
wpl_user_preference	jadbio.com	WP GDPR Cookie Consent Preferences	1 year	HTTP
__stripe_mid	app.jadbio.com	For processing payment and to aid in fraud detection.	1 year	HTTP
__stripe_sid	app.jadbio.com	Stripe Cookie to process payments	Session	HTTP

Name	Domain	Purpose	Expiry	Type
_ga	jadbio.com	Google Universal Analytics long-time unique user tracking identifier.	2 years	HTTP
_gid	jadbio.com	Google Universal Analytics short-time unique user tracking identifier.	1 days	HTTP
IDE	doubleclick.net	Google advertising cookie used for user tracking and ad targeting purposes.	2 years	HTTP

Name	Domain	Purpose	Expiry	Type
sp_t	spotify.com	---	1 year	---
sp_landing	spotify.com	---	1 days	---
muxData	open.spotify.com	---	20 years	---
_gcl_au	jadbio.com	---	3 months	---
_gat_UA-150261121-1	jadbio.com	---	Session	---
test_cookie	doubleclick.net	A generic test cookie set by a wide range of web platforms.	Session	HTTP
_lfa	jadbio.com	---	2 years	---
drift_campaign_refresh	jadbio.com	---	Session	---
m	m.stripe.com	---	2 years	---

COVID-19 Research

Automated Machine Learning in healthcare and medical diagnosis: COVID-19

Can AutoML improve the diagnostic/predictive models for COVID-19?

JADBio AutoML for effortless machine learning models

Who is JADBio AutoML for?

See JADBio in Action

Efficient prediction of TCGA cancer subtypes using compact multi-omic signatures for personalized medicine

Feature Signature Discovery for Autism Detection: An Automated Machine Learning Based Feature Ranking Framework

A Blood-Based Molecular Clock for Biological Age Estimation

Toward Automatic Risk Assessment to Support Suicide Prevention

A machine learning approach utilizing DNA methylation as an accurate classifier of COVID-19 disease severity

Mass Spectrometry Proteomics analysis discovers biomarkers in serum months to years before non-small cell lung cancer: The HUNT study

Predicting Survival Time in Low-grade Glioma (LGG) Patients

Reading the Molecular Labels in Cancer

Monitoring Parkinson’s Progression from Home

Discovering Plant Metabolic Biomarkers

Join the JADai Community!

GREECE

QUICK LINKS

FOLLOW US

CONTACT

LEGAL

COVID-19 Research

Automated Machine Learning in healthcare and medical diagnosis: COVID-19

Can AutoML improve the diagnostic/predictive models for COVID-19?

JADBio AutoML for effortless machine learning models

Who is JADBio AutoML for?

See JADBio in Action

Efficient prediction of TCGA cancer subtypes using compact multi-omic signatures for personalized medicine

Feature Signature Discovery for Autism Detection: An Automated Machine Learning Based Feature Ranking Framework

A Blood-Based Molecular Clock for Biological Age Estimation

Toward Automatic Risk Assessment to Support Suicide Prevention

A machine learning approach utilizing DNA methylation as an accurate classifier of COVID-19 disease severity

Mass Spectrometry Proteomics analysis discovers biomarkers in serum months to years before non-small cell lung cancer: The HUNT study

Predicting Survival Time in Low-grade Glioma (LGG) Patients

Reading the Molecular Labels in Cancer

Monitoring Parkinson’s Progression from Home

Discovering Plant Metabolic Biomarkers

Join the JADai Community!

Sign up with aFREE Basic plan!

GREECE

QUICK LINKS

FOLLOW US

CONTACT

LEGAL

Sign up with a
FREE Basic plan!