Standard machine learning analysis of proteomic and metabolomic data from COVID-19 patients produced biosignatures which contain large numbers of predictors, hampering their clinical application. Moreover, their performance often drops significantly when validated in independent groups, which is expected as sample numbers are often inevitably low. By applying automated Machine Learning, we attempt to improve modeling and deliver models/signatures that can be readily available for diagnostic assays to aid the fight against the pandemic.
Georgios Papoutsoglou, Computer Science Department, University of Crete Makrina Karaglani, Laboratory of Pharmacology, Medical School, Democritus University of Thrace Vincenzo Lagani, Institute of Chemical Biology, Ilia State University Naomi Thomson, JADBio Dimitri Oluf Røe, Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology & Clinical Cancer Research Center, Department of Clinical Medicine, Aalborg University Hospital Ioannis Tsamardinos, JADBio & Institute of Applied and Computational Mathematics, Foundation for Research and Technology–Hellas Chatzaki Ekaterini, Laboratory of Pharmacology, Medical School, Democritus University of Thrace & Institute of Agri-Food and Life Sciences, Hellenic Mediterranean University Research Centre www.nature.com/articles/s41598-021-94501-0
The rapid outbreak of COVID-19 brings intense pressure on healthcare systems, with an urgent demand for effective diagnostic, prognostic and therapeutic procedures. Despite the global scientific effort, there is lack of efficient predictive models for patient stratification and successful management of the disease. Here, we employed Automated Machine Learning (AutoML) to analyze 3 publicly available COVID-19 datasets, including serum proteomic, metabolomic and transcriptomic measurements. Pathway analysis of the selected features was also performed. Analysis of a combined proteomic and metabolomic dataset produced ten equivalent signatures of two features each, with AUC 0.840(CI 0.723 – 0.941) in discriminating severe from non-severe COVID-19 patients. A transcriptomic dataset led to two equivalent signatures of eight features each with AUC 0.914(CI 0.865 – 0.955) in identifying COVID-19 patients from those with a different acute respiratory illness. A second transcriptomic dataset led to two equivalent signatures of nine features each with AUC 0.967(CI 0.899 – 0.996) in identifying COVID-19 patients from virus-free individuals. Multiple new features emerged implicated in a wide range of pathways including viral mRNA translation pathways, interferon gamma signaling and Innate Immune System. In conclusion, by application of AutoML multiple biosignatures were built in a fast automated way, presenting reduced feature number and high predictive performance that remained high upon validation. These favorable characteristics are eminent for further development of cost-effective clinical assays to contribute to better disease management. Our results also highlight the importance of revisiting precious and well-built datasets for maximal conclusion extraction from a given experimental observation. Funding Statement: No funding was received for this research. Read more on Automated Machine Learning Optimizes and Accelerates COVID-19 Predictive Modeling.
#COVID19, #automatedMachineLearning, #SARS-CoV-2, #modeling, #predictivemodels, #validation
JADBio can meet your needs. Ask one of our experts for an interactive demo.
Stay connected to get our news first!
Sign up with a FREE Basic plan! Be part of a growing community of AutoML enthusiasts
GET STARTED