Breast cancer (BrCa) is characterized by aberrant DNA methylation. We leveraged high-throughput methylation data from BrCa and normal breast tissues and identified 11,176 to 27,786 differentially methylated genes (DMGs) against clinically relevant end-points. Innovative automated machine learning was employed to construct three highly performing signatures for (1) the discrimination of BrCa patients from healthy individuals, (2) the identification of BrCa metastatic disease and (3) the early diagnosis of BrCa. Furthermore, functional analysis revealed that most genes selected in the signatures showed associations to BrCa, with regulation of transcription being the main biological process, the nucleus being the main cellular component and transcription factor activity and sequence-specific DNA binding being the main molecular functions. Overall, revisiting methylome datasets led to three high-performance signatures that are readily available for improving BrCa precision management and significant knowledge mining related to disease pathophysiology.
Maria Panagopoulou, Makrina Karaglani, Vangelis G. Manolopoulos, Laboratory of Pharmacology, Medical School, Democritus University of Thrace
Ioannis Iliopoulos, Department of Basic Sciences, School of Medicine, University of Crete
Ioannis Tsamardinos, JADBio & Institute of Applied and Computational Mathematics, Foundation for Research and Technology–Hellas
Ekaterini Chatzaki, Laboratory of Pharmacology, Medical School, Democritus University of Thrace & Institute of Agri-Food and Life Sciences, Hellenic Mediterranean University Research Centre
DNA methylation plays an important role in breast cancer (BrCa) pathogenesis and could contribute to driving its personalized management. We performed a complete bioinformatic analysis in BrCa whole methylome datasets, analyzed using the Illumina methylation 450 bead-chip array. Differential methylation analysis vs. clinical end-points resulted in 11,176 to 27,786 differentially methylated genes (DMGs). Innovative automated machine learning (AutoML) was employed to construct signatures with translational value. Three highly performing and low-feature-number signatures were built: (1) A 5-gene signature discriminating BrCa patients from healthy individuals (area under the curve (AUC): 0.994 (0.982–1.000)). (2) A 3-gene signature identifying BrCa metastatic disease (AUC: 0.986 (0.921–1.000)). (3) Six equivalent 5-gene signatures diagnosing early disease (AUC: 0.973 (0.920–1.000)). Validation in independent patient groups verified performance. Bioinformatic tools for functional analysis and protein interaction prediction were also employed. All protein encoding features included in the signatures were associated with BrCa-related pathways. Functional analysis of DMGs highlighted the regulation of transcription as the main biological process, the nucleus as the main cellular component and transcription factor activity and sequence-specific DNA binding as the main molecular functions. Overall, three high-performance diagnostic/prognostic signatures were built and are readily available for improving BrCa precision management upon prospective clinical validation. Revisiting archived methylomes through novel bioinformatic approaches revealed significant clarifying knowledge for the contribution of gene methylation events in breast carcinogenesis.
JADBio AutoML was used to produce diagnostic and prognostic signatures/classifiers based on the β-values of the methylation data. JADBio is applicable to low-sample, high-dimensional -omics data and provides predictive models by employing standard, best-practice and state-of-the-art statistical and machine learning methods. JADBio automatically produces predictive models either for a discrete (classification), a continuous (regression) or a time-to-event (survival analysis) outcome. Specifically, JADBio [14] has the following functionality and properties: (a) given a 2D matrix of data, it automatically produces predictive models for a categorical (classification), continuous (regression), or time-to-event (survival analysis) outcome. No selection of appropriate algorithms to apply is necessary, nor is tuning of their hyper-parameter values. The available classification algorithms are: random forest classification, support vector machine (SVM) and ridge logistic regression and classification decision trees; (b) it identifies multiple equivalent biosignatures; and (c) it produces conservative predictive performance estimates and corresponding confidence intervals. It reliably processes up to hundreds of thousands of features and sample sizes as low as a couple of dozen. JADBio also employs the recently developed Bootstrap Bias Corrected CV (BBC-CV) protocol for tuning the hyper-parameters of algorithms, while estimating performance and adjusting for multiple tries. A description of the JADBio architecture can be found in Montesanto et al. [77].
For all datasets, the performance was evaluated via internal validation (BBC-CV within each dataset). An extensive tuning effort was used and sample datasets were automatically split into training and validation groups in a proportion of 70/30 using JADBio.
View shared results on the JADBio platform
Classification analysis between BrCa (class 2) vs Normal tissue (class 1)
Classification analysis between primary BrCa (class 1) vs metastatic BrCa tissue (class 2)
Classification analysis between early-stage BrCa (class 1) vs Normal tissue (class 0)
Classification analysis between early (class 1) and advanced stage BrCa (class 2)
#predictiveanalytics #BreastCancer #Methylation #cancer #patientcare
JADBio can meet your needs. Ask one of our experts for an interactive demo.
Stay connected to get our news first!
JADBio can meet your needs. Ask one of our experts for an interactive demo.
Sign up with a FREE Basic plan! Be part of a growing community of AutoML enthusiasts
GET STARTED