CASE STUDY

Accessing Suicide Probability with ML

Toward Automatic Risk Assessment to Support Suicide Prevention

Marios Adamou, South West Yorkshire Partnership NHS Foundation Trust, Wakefield, Department of Computer Science, University of Huddersfield; Grigoris Antoniou, University of Huddersfield; Vincenzo Lagani, JADBio Gnosis DA, Science and Technology Park of Crete, Institute of Chemical Biology, Ilia State University, Tbilisi; Elissavet Greasidou, Paulos Charonyktakis, JADBio Gnosis DA, Science and Technology Park of Crete; Ioannis Tsamardinos, Department of Computer Science, University of Huddersfield, JADBio Gnosis DA, Science and Technology Park of Crete, Computer Science Department, University of Crete; Michael Doyle, South West Yorkshire Partnership NHS Foundation Trust, Wakefield

Digital Library: https://doi.org/10.1027/0227-5910/a000561

Abstract

Background: Suicide has been considered an important public health issue for years and is one of the main causes of death worldwide. Despite prevention strategies being applied, the rate of suicide has not changed substantially over the past decades. Suicide risk has proven extremely difficult to assess for medical specialists, and traditional methodologies deployed have been ineffective. Advances in machine learning make it possible to attempt to predict suicide with the analysis of relevant data aiming to inform clinical practice.

Aims: (a) test artificial intelligence based, referral-centric methodology in the context of the National Health Service (NHS), (b) determine whether statistically relevant results can be derived from data related to previous suicides, and (c) develop ideas for various exploitation strategies.

Method: The analysis used data of patients who died by suicide in the period 2013–2016 including both structured data and free-text medical notes, necessitating the deployment of state-of-the-art machine learning and text mining methods.

Limitations: Sample size is a limiting factor for this study, along with the absence of non-suicide cases. Specific analytical solutions were adopted for addressing both issues.

Results and Conclusion: The results of this pilot study indicate that machine learning shows promise for predicting within a specified period which people are most at risk of taking their own life at the time of referral to a mental health service.

How was JADBio used?

Text Mining: The clinical notes included unstructured free-text information that first had to be analyzed and converted into structured, measurable data, suitable for a machine learning analysis. After cleaning and filtering, the processed text was converted into a dataset of single words (bag-of-words) using the scikit-learn software (Pedregosa, Varoquaux et al. 2011).

For each of the predictive analyses a set of predictive variables was obtained, a predictive model, and a conservative estimate of the model’s generalization performance as well as 95% confidence interval for it. Performances as Area Under the Receiver Operating Characteristic curve (AUC) were reported.

Results for the predictive analyses for 𝑡 = 3 (months before suicide). N and M denote the number of samples and variables in the dataset, respectively. C is the best-performing configuration from which the final model is constructed. The best prediction performance is highlighted in bold (Table 2).

The best overall performing model is obtained where (a) both structured and textual variable are included, (b) there is no restriction on the tested configurations, and (c) the time-point 𝑡 is equal to three months. The AUC achieved in this case is 0.705. The 95% CI of the AUC does not include the value 0.5 (CI = [0.646, 0.760]), thus the results are deemed statistically significant at the standard significance level of 5%. For the complete (unrestricted) analysis it is noticed that the predictive models derived from the datasets that include both the structured and the textual variables perform better than those derived from datasets that include only the structured variables by an average of 0.05 AUC points. The difference in AUC was found to be significant, indicating that the free-text medical notes indeed contain predictive information. The statistical significance test that was used to compare the AUC curves is described in (DeLong, DeLong et al. 1988) and it was applied it with a significance level of 0.05.

The best prediction model in this analysis had an AUC value of just over 0.7. While this result would be considered fair for engineering and some biomedical applications, in the context of mental health diagnoses, the results reported here would be a major step towards a more accurate assessment of suicide risk, if backed up by further studies.

Schematic representation of JADBio data analysis pipeline: The tool determines the set of N configurations to try. Hyper- parameters are depicted as tuning sliders. The complete dataset is partitioned into 𝐾 folds. Each fold is considered in turn as a test case for the models trained with every configuration on the union of the remaining folds. The best-performing configuration is selected on the basis of its average performance on the test folds. The final predictive model is trained with the best-performing configuration on the complete dataset. Finally, a bootstrap-based procedure is used to remove the optimism from the cross validated performance estimate (Figure 1)

Read more at Crisis – The Journal of Crisis Intervention and Suicide Prevention: Toward Automatic Risk Assessment to Support Suicide Prevention

OTHER

CASE STUDIES

Do you have questions?

JADBio can meet your needs. Ask one of our experts for an interactive demo.

Stay connected to get our news first!

Do you have questions?

JADBio can meet your needs. Ask one of our experts for an interactive demo.

JADai by JADBio
REQUEST A DEMO

Join the JADai Community!

Sign up with a FREE Basic plan! Be part of a growing community of AutoML enthusiasts

JADBio JADai