Depression and suicide risk prediction models using blood-derived multi-omics data

Bhak, Youngjune; Jeong, Hyoung-oh; Cho, Yun Sung; Jeon, Sungwon; Cho, Juok; Gim, Jeong-An; Jeon, Yeonsu; Blazyte, Asta; Park, Seung Gu; Kim, Hak-Min; Shin, Eun-Seok; Paik, Jong-Woo; Lee, Hae-Woo; Kang, Wooyoung; Kim, Aram; Kim, Yumi; Kim, Byung Chul; Ham, Byung-Joo; Bhak, Jong; Lee, Semin

doi:10.1038/s41398-019-0595-2

Download PDF

Article
Open access
Published: 17 October 2019

Depression and suicide risk prediction models using blood-derived multi-omics data

Translational Psychiatry volume 9, Article number: 262 (2019) Cite this article

6884 Accesses
34 Citations
5 Altmetric
Metrics details

Subjects

Abstract

More than 300 million people worldwide experience depression; annually, ~800,000 people die by suicide. Unfortunately, conventional interview-based diagnosis is insufficient to accurately predict a psychiatric status. We developed machine learning models to predict depression and suicide risk using blood methylome and transcriptome data from 56 suicide attempters (SAs), 39 patients with major depressive disorder (MDD), and 87 healthy controls. Our random forest classifiers showed accuracies of 92.6% in distinguishing SAs from MDD patients, 87.3% in distinguishing MDD patients from controls, and 86.7% in distinguishing SAs from controls. We also developed regression models for predicting psychiatric scales with R² values of 0.961 and 0.943 for Hamilton Rating Scale for Depression–17 and Scale for Suicide Ideation, respectively. Multi-omics data were used to construct psychiatric status prediction models for improved mental health treatment.

Identification of potential blood biomarkers associated with suicide in major depressive disorder

Article Open access 14 April 2022

A methylation study of long-term depression risk

Article 09 September 2019

A diagnostic model based on bioinformatics and machine learning to differentiate bipolar disorder from schizophrenia and major depressive disorder

Article Open access 14 February 2024

Introduction

Suicide and depression are major health hazards, resulting in the death of one person every 40 s globally^1,2. They are complex and intertwined phenomena: ~4% of individuals diagnosed with depression commit suicide, and more than half of the persons who attempt suicide meet the criteria of depression³. The suicide rate in South Korea (25.8 deaths per 100,000 persons) is among the highest worldwide and is 2.30 times higher than the average of the Organization for Economic Co-operation and Development (OECD) countries (11.2 deaths per 100,000 persons). South Korea has been ranked second among the OECD countries in terms of suicide rates. Notably, the suicide rate for women in South Korea is the highest (14.7 deaths per 100,000 women) among the OECD countries (average 4.86 deaths per 100,000 women)⁴. Hence, predicting depression and suicide risk is a global problem, with exceptional importance in South Korea. Therefore, developing effective models for predicting depression and suicidality may elucidate breakthrough treatments.

The current depression and suicide prediction methods rely on self-reported measures such as questionnaires and interviews, which can be too subjective; and people with depression and suicidal ideation may not be honest about expressing their thoughts⁵. Thus, health records or neural representations have been adopted, with machine learning techniques, to predict the risk of depression and suicide^6,7. Identifying highly accurate biomarkers would also be an ideal solution that would give an insight to our understanding of depression and suicide. Since the brain is the target organ in psychiatry, brain-based biomarkers have been highly studied⁸. However, an invasive brain biopsy is potentially dangerous, and therefore, biomarkers obtained from the peripheral blood are a practical alternative. Previous studies confirmed meaningful correlations of methylation and expression profiles between the blood and brain^9,10,11. Several previous studies identified methylation or gene expression biomarkers for depression and suicide risk from the blood^12,13,14,15. However, none of them combined multi-omics data in a systematic manner to develop models for depression and suicide risk prediction, although applying machine learning to combine different types of multi-omics data may improve prediction accuracy^16,17,18. Here, we present machine learning and statistical prediction models for depression and suicide risk prediction using blood-derived multi-omics data (Fig. 1a).

**Fig. 1: The study workflow and performance of the models.**

Results and discussion

Baseline sample characteristics

We recruited three cohorts (age range: 19–46 years, average: 28.6 ± 8.98 years): (i) 56 suicide attempters (SAs) diagnosed with major depressive disorder; (ii) 39 non-suicide attempters diagnosed with major depressive disorder (MDD); and (iii) 87 healthy individuals (control) through the Korea University Medical Center. Importantly, most of the SA participants (51 of the 56, 91.1%) were recurrent SAs that may also attempt suicide in the future¹⁹, and 48 out of 56 SA participants had a history of MDD (Tables 1, S1). We collected relevant data from the participants: (i) questionnaires about their history of suicide or depression; (ii) psychiatric scales, including the Hamilton Rating Scale for Depression-17 (HAM17) and the Scale for Suicidal Ideation (SSI); and (iii) peripheral blood samples for methylome and transcriptome sequencing analysis.

Table 1 Baseline sample characteristics

Full size table

Building the psychiatric status classification and regression models

To build the label classification and psychiatric scale regression models, we identified differentially methylated sites (DMSs, β-value difference >1% and Benjamini–Hochberg adjusted P < 0.05) from Methyl-seq data and differentially expressed genes (DEGs, fold change >1.2 and Benjamini-Hochberg adjusted P < 0.05) from whole-transcriptome sequencing data. Next, we performed feature selection to further improve model performance. For the model differentiating SAs from MDD (SA vs. MDD classifier), 7353 DMSs were initially selected, but no DEGs were identified. After the feature selection, 69 DMSs remained (Table S2), and 92.6% accuracy was achieved by leave-one-out cross validation (Fig. 1b). We also selected 12,633 and 10,412 DMSs (16 and 154 DEGs) as input features for the MDD vs. control and SA vs. control classifiers, respectively. After the feature selection, 80 and 95 DMSs (0 and 7 DEGs) remained as input features for the MDD vs. control and SA vs. control classifiers, respectively (Tables S3 and S4). The overall accuracies were 87.3% and 86.7% for the MDD vs. control and SA vs. control classifiers, respectively (Fig. 1b). However, sensitivities were 59% and 67.9% for the MDD vs. control and SA vs. control classifiers, respectively, which were expected. There were no overlapping input features among the classifier models.

To construct the psychiatric scale regression models, we used the DMSs and DEGs that were significantly correlated (Spearman’s rho > 0.2, P < 0.05) with the HAM17 or SSI scores. For the HAM17 regression model, 2150 DMSs and 80 DEGs were selected. For SSI, 1273 DMSs and 82 DEGs were selected. After feature selection, 810 and 467 DMSs (48 and 51 DEGs) remained for HAM17 and SSI regression models, respectively (Tables S5 and S6). There were 139 overlapping markers between the two regression models. R² values were 0.961 for HAM17 and 0.943 for SSI (Fig. 1c, d). The area under the receiver operating characteristic curve (AUC)—classifying MDD and control—was 0.993 and 0.999 for the measured and the predicted HAM17, respectively (Fig. 2a). The AUC—classifying SA and control—was 0.951 and 0.976 for the measured and the predicted SSI, respectively (Fig. 2b). The high AUCs from the predicted HAM17 and SSI may compensate for the low sensitivity of the case classifier models for the MDD vs. control and SA vs. control.

**Fig. 2: Receiver operating characteristic curves (ROC curve).**

Investigations of the model input features

Since input features were derived from the DEGs and DMSs between groups, investigation of the input feature could give insight into biomarkers significantly associated with depression and suicide attempt. Most of the model input features were methylation markers. This may be due to more methylation markers (DMSs) than gene expression markers (DEGs) from the initial feature selection. Interestingly, the gene expression markers were ranked significantly higher, in terms of feature importance, than the methylation markers only in the regression models (Wilcoxon signed-rank test P values for HAM17 regression model: 2.3e-05, SSI regression model: 0.020). Hence, the proportion of marker types in the initial step may not have solely influenced marker types in the final model. This may be due to the relatively more dynamic nature of gene expression levels compared to methylation²⁰. Simply, the gene expression markers could more effectively represent emotional state, since the psychiatric assessment was performed together with blood sample collection in this study. However, methylation marker dominance in the classifier models might be due to traumatic experience-related methylation profile changes, as reported previously²¹.

Next, we conducted a functional enrichment test to investigate biological functions and pathways associated with the input features for the models using DAVID (Database for Annotation Visualization and Integrated Discovery)²² (Tables 2, S7). No significant enrichment was observed in biological functions or pathways for the SA vs. MDD classifier input features (Benjamini-Hochberg adjusted P < 0.05). However, the feature set included the ARHGAP39 gene (Rho GTPase Activating Protein 39, chr8:145809066, Fig. 3a), a previously reported methylation marker for suicide risk²³ (Table S2).

Table 2 Enrichment analysis result from the models’ makers

Full size table

**Fig. 3: Methylation box plots of the model features.**

We repeatedly observed the protocadherin (PCDH) gene family from enriched biological terms in the feature sets of SA vs. control classifier and HAM17 and SSI regression models (Tables 2, S4–S7). The PCDH gene family is relevant in neuron and synaptic functions, and its methylation can be altered in response to early-life stress^24,25,26,27. A peripheral blood methylation study reported that monozygotic twins that are concordant and discordant for MDDs showed significant intra-pair methylation differences for the PCDH genes²⁸.

The Hippo signaling pathway was significantly enriched in the MDD vs. control classifier feature set. This pathway includes PRKCZ (Protein kinase C, chr2:202900702, Fig. 3b) and FZD7 (Frizzled Class Receptor 7, chr1:2010660, Fig. 3c), which are known to be related to antidepressant response^29,30 (Table S3). Although this may be because most (94.9%) of the patients with MDD in this study use antidepressants, it might still suggest antidepressant response as a possible predictor for MDD. This should be validated separately, based on a larger and more diverse cohort.

Here, we present machine learning and statistical models to predict depression and suicide risk, using blood-derived multi-omics data. Our classifier models showed comparable accuracies in predicting the correct labels for patients with MDD, SAs, and healthy controls (Fig. 1b). Psychiatric scales, such as HAM17 and SSI, were also successfully predicted by our regression models (Fig. 1c, d). Although it was marginal, the estimated psychiatric scales classified participants better than the measured scores (Fig. 2a, b). Our models may not guarantee their effectiveness when applied to independent cohorts³¹, but our methodology helps to fill in the gaps in our understanding of the pathogenesis and treatment of psychiatric disorders.

Methods

Participant recruitment, diagnostic assessment, and blood sampling

The data in this study presented from three cohorts (i) 56 suicide attempters (SA); (ii) 39 major depressive disorder diagnosed patients (MDD); and (iii) 87 healthy control samples (Tables 1, S1).

A total of 95 depressed patients, with or without suicide attempts were recruited prospectively through the outpatient psychiatric clinic of Korea University Anam Hospital in Seoul, Republic of Korea from April 2015 to August 2017. The groups were then classified to either SA or MDD contingent on the suicide attempt (i.e. 56 suicide attempters and 39 non-suicide attempters). The patients were confirmed with the diagnosis (i.e. major depressive disorder) by the board-certified psychiatrists (Ham BJ, Baek JW and Lee HW) based on the Structured Clinical Interview from the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) Axis I disorders (SCID-I). Basic demographic (e.g. age, sex, education level) and clinical (e.g. antidepressant use, clinical history) information was collected by diagnostic assessments. The current clinical status was measured with psychiatric scales: the Hamilton Rating Scale for Depression-17 (HAM17)³² which indicates the severity of depressive symptoms, and the 19-item Beck Scale for Suicide Ideation (SSI)³³.

There were 10 SAs who were recognized as acute depressive patients with the following criteria: first, those who have current HAM17 score over 14. Second, the duration of current and past suicide attempts was <3 months for those who have the recurrence of suicide attempt (2 out of 10 SAs), or those who attempted suicide for the first time (8 out of 10 SAs).

The healthy controls were recruited for the people between 19 and 65 years of age from the community, in which the advertisements were made. A total of 87 people responded to voluntarily participate in the study. They were assessed through the psychiatric diagnosis in the same way as the patient groups were assessed and determined to have none of psychiatric disorders in past and present.

The diagnostic assessment and blood sampling were made on the same day. The participants’ ID were de-identified after the diagnostic assessment and the blood sampling. In accordance with the Declaration of Helsinki, a total of 182 participants signed informed consents forms about the research goals and procedures. All participants were aware of the right to freely drop out of the study at any stage (no participant dropped out). The study protocol was approved by the Institutional Review Board of Korea University Anam Hospital (IRB No: ED15006). This study was approved by Institutional Review Board at Ulsan National Institute of Science and Technology with UNISTIRB-15-11-C.

Methyl-seq

Genomic DNA was isolated from blood using the DNeasy Blood & Tissue Kit (Qiagen, Germany) according to the manufacture’s protocol. Extracted DNA was quantified by Quant-iT BR assay kit (Invitrogen). Genomic libraries were prepared using the SureSelect^XT Methyl-Seq Target Enrichment System for Illumina Multiplexed Sequencing (Agilent Technologies). Briefly, 2 μg of genomic DNA per sample were randomly sheared via ultra-sonification and DNA fragments between 150 and 200 bp were extracted. Sample DNA then underwent end repair, adapter ligation, hybridization to SureSelect^XT Methyl-Seq Capture Library, streptavidin bead enrichment, bisulfite conversion, PCR amplification and were uniquely indexed using a 6-letter sequencing tag following the manufacturer’s protocol. Sample genomic libraries were then pooled and multiplexed in four separate lanes using 100 bp paired-end Illumina NovaSeq6000 S4 sequencing.

RNA-seq

Total RNA was extracted using PAXgene blood RNA kit from Qiagen (Qiagen, Germany), according to the manufacturer’s recommendations. RNA quality was assessed by running 1 μl on the Bioanalyzer system (Agilent, CA, USA) to ensure RIN and rRNA ratio. We used 100 ng total RNA from all participants to prepare sequencing libraries with by using the TruSeq RNA sample preparation kit (Illumina, CA, USA). Quality of these cDNA libraries was evaluated with the Agilent 2100 BioAnalyzer (Agilent, CA, USA). They were quantified with the KAPA library quantification kit (Kapa Biosystems, MA, USA) according to the manufacturer’s library quantification protocol. Following cluster amplification of denatured templates, sequencing was progressed as paired-end (2 × 100 bp) using Illumina NovaSeq6000 S4 platform.

Bioinformatic analysis

The sequenced Methyl-seq and RNA-seq read were filtered out when the read’ Q20 base content was lower than 70%, using IlluQCPRLL.pl script of NGSQCToolkit (ver 2.3.3)³⁴. The filtered Methyl-seq reads were mapped to the hg19 human genome assembly using Bismark (ver 0.14.5)³⁵. Methylation information was acquired using MethylExtract (ver 1.9.1)³⁶. The acquired methylation information was further refined as beta value, a proportion of methylated bases at each locus. Only CpG sites with minimum depth ten for equal or more than 75% of samples for both batch and cohort were used. The beta value was adjusted for batch, age, and gender using Combat of SVA package (ver 3.24.4) in R (ver 3.4.0)³⁷. The adjusted beta-value was used for further analyses. Differentially methylated site analysis was conducted using methylKit package (ver 1.5.0) in R³⁸. All methylation sites were annotated with its positionally related genes (including upstream and downstream 5 kb of gene region). The filtered RNA-seq reads were mapped to the hg19 human genome assembly using Mapsplice (ver 2.1.8)³⁹ and gene expression was quantified using RSEM (ver 1.9.1)⁴⁰. The transcripts per kilobase million (TPM) was adjusted for batch, age, and gender using Combat of SVA package (ver 3.24.4) in R (ver 3.4.0)³⁷. We identified differentially expressed genes (DEG) using DESeq2⁴¹.

Classifier and regression model construction

The three binary classification models (SA vs. MDD, MDD vs. control and, SA vs. control) were constructed using RandomForestClassifier in scikit-learn (ver 0.19.1)⁴². The first step was the feature construction which uses statistical significance of DMS and DEG in each model. DMSs with beta value difference >0.01 and Benjamini-Hochberg adjusted P < 0.05; and DEGs with fold change >1.2 and Benjamini–Hochberg adjusted P < 0.05 for each comparison (SA vs. MDD, MDD vs. control and, SA vs control) were selected as the feature. Then, the selected features were filtered by feature selection which is the step eliminates the irrelevant features acting as noise to improve the prediction accuracy. For the feature selection, a tree-based feature selection algorithm that calculates feature importance based on the contribution of each feature to model performance during training was used. The features were removed if its feature importance derived from the random forest algorithm during the training was zero. During the training, a number of trees and max features were selected until the out-of-begging (OOB) error rate became stabilized. To verify the model performance, leave-one-out cross validation was used. Two psychiatric scale regression models for HAM17 and SSI were built using LinearRegression in scikit-learn (ver 0.19.1)⁴². The features were selected if the DMSs and the DEGs for each comparison (SA vs MDD, MDD vs Control and, SA vs Control) were significantly correlated with HAM17 or SSI (Spearman correlation rho > 0.2 and P < 0.05). We used SelectFromModel in scikit-learn for the feature selection.

Functional enrichment and pathway analysis

We conducted a functional enrichment test by using DAVID²² with default parameters. DEGs and positionally related genes with DMSs from the input feature of the models are used for enrichment test. Only input feature including significant DMSs and DEGs with more than zero feature importance during the model training were selected for functional enrichment test.

Data availability

All sequencing files are available from the National Center for Biotechnology Information (NCBI) database (SRP200298).

Code availability

Requests for the computer code should be addressed to the author for correspondence.

References

World Health Organization, (WHO), “Depression” (22 March 2018); http://www.who.int/mediacentre/factsheets/fs369/en/.
World Health Organization, (WHO), “Suicide” (2018); http://www.who.int/mediacentre/factsheets/fs398/en/.
Hawton, K. & Kv, Heeringen Suicide. Lancet 373, 1372–1381 (2009).
Article Google Scholar
Organisation for Economic Co-operation and Development (OECD), “Suicide rates (indicator)” (2019); https://data.oecd.org/healthstat/suicide-rates.htm.
Busch K. A., Fawcett J., Jacobs D. G. Clinical correlates of inpatient suicide. J. Clin. Psychiatry 64, 14–19 (2003).
Just, M. A. et al. Machine learning of neural representations of suicide and emotion concepts identifies suicidal youth. Nat. Hum. Behav. 1, 911 (2017).
Article Google Scholar
Walsh, C. G., Ribeiro, J. D. & Franklin, J. C. Predicting risk of suicide attempts over time through machine learning. Clin. Psychological Sci. 5, 457–469 (2017).
Article Google Scholar
Nagy, C. et al. Astrocytic abnormalities and global DNA methylation patterns in depression and suicide. Mol. Psychiatry 20, 320 (2015).
Article CAS Google Scholar
Witt, S. H. et al. Comparison of gene expression profiles in the blood, hippocampus and prefrontal cortex of rats. In Silico Pharmacol. 1, 15 (2013).
Article Google Scholar
Walton, E. et al. Correspondence of DNA methylation between blood and brain tissue and its application to schizophrenia research. Schizophr. Bull. 42, 406–414 (2015).
Article Google Scholar
Sullivan, P. F., Fan, C. & Perou, C. M. Evaluating the comparability of gene expression in blood and brain. Am. J. Med. Genet. B Neuropsychiatr. Genet. 141, 261–268 (2006).
Article Google Scholar
Le-Niculescu, H. et al. Discovery and validation of blood biomarkers for suicidality. Mol. Psychiatry 18, 1249 (2013).
Article CAS Google Scholar
Davies, M. N. et al. Hypermethylation in the ZBTB20 gene is associated with major depressive disorder. Genome Biol. 15, R56 (2014).
Article Google Scholar
Guintivano, J. et al. Identification and replication of a combined epigenetic and genetic biomarker predicting suicide and suicidal behaviors. Am. J. Psychiatry 171, 1287–1296 (2014).
Article Google Scholar
Spijker, S. et al. Stimulated gene expression profiles as a blood marker of major depressive disorder. Biol. Psychiatry 68, 179–186 (2010).
Article CAS Google Scholar
Zhang, L. et al. Deep learning-based multi-omics data integration reveals two prognostic subtypes in high-risk neuroblastoma. Front. Genet. 9, 477 (2018).
Article CAS Google Scholar
Chaudhary, K., Poirion, O. B., Lu, L. & Garmire, L. X. Deep learning–based multi-omics integration robustly predicts survival in liver cancer. Clin. Cancer Res. 24, 1248–1259 (2018).
Article CAS Google Scholar
Chung N. C., et al. Unsupervised classification of multi-omics data during cardiac remodeling using deep learning. Methods 166, 66–73 (2019).
Beghi, M., Rosenbaum, J. F., Cerri, C. & Cornaggia, C. M. Risk factors for fatal and nonfatal repetition of suicide attempts: a literature review. Neuropsychiatr. Dis. Treat. 9, 1725 (2013).
PubMed PubMed Central Google Scholar
Furukawa R., et al. Intraindividual dynamics of transcriptome and genome-wide stability of DNA methylation. Sci. Rep. 6, 26424 (2016).
El Hajj, N., Dittrich, M. & Haaf, T. Epigenetic dysregulation of protocadherins in human disease. Semin. Cell Dev. Biol. 69, 172–182 (2017).
Article CAS Google Scholar
Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44 (2008).
Article Google Scholar
Clive, M. L. et al. Discovery and replication of a peripheral tissue DNA methylation biosignature to augment a suicide prediction model. Clin. Epigenetics 8, 113 (2016).
Article Google Scholar
McGowan, P. O. et al. Broad epigenetic signature of maternal care in the brain of adult rats. PLoS One 6, e14739 (2011).
Article CAS Google Scholar
Leung, L. C. et al. Coupling of NF-protocadherin signaling to axon guidance by cue-induced translation. Nat. Neurosci. 16, 166 (2013).
Article CAS Google Scholar
Kim, S. et al. The expression of non-clustered protocadherins in adult rat hippocampal formation and the connecting brain regions. Neuroscience 170, 189–199 (2010).
Article CAS Google Scholar
Anitha, A. et al. Protocadherin α (PCDHA) as a novel susceptibility gene for autism. J. Psychiatry Neurosci. 38, 192 (2013).
Article Google Scholar
Cordova-Palomera, A. et al. Genome-wide methylation study on depression: differential methylation and variable methylation in monozygotic twins. Transl. Psychiatry 5, e557 (2015).
Article CAS Google Scholar
Breitfeld, J., Scholl, C., Steffens, M., Laje, G. & Stingl, J. Gene expression and proliferation biomarkers for antidepressant treatment resistance. Transl. Psychiatry 7, e1061 (2017).
Article CAS Google Scholar
Shi, Y. et al. Genetic variation in the calcium/calmodulin-dependent protein kinase (CaMK) pathway is associated with antidepressant response in females. J. Affect. Disord. 136, 558–566 (2012).
Article CAS Google Scholar
Mullins, N. et al. Investigation of blood mRNA biomarkers for suicidality in an independent sample. Transl. Psychiatry 4, e474 (2014).
Article CAS Google Scholar
Hamilton M. Assessment of Depression. (Springer, 1986).
Beck, A. T., Kovacs, M. & Weissman, A. Assessment of suicidal intention: the Scale for Suicide Ideation. J. Consult. Clin. Psychol. 47, 343 (1979).
Article CAS Google Scholar
Patel, R. K. & Jain, M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7, e30619 (2012).
Article CAS Google Scholar
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).
Article CAS Google Scholar
Barturen G., Rueda A., Oliver J. L., Hackenberg M. MethylExtract: high-quality methylation maps and SNV calling from whole genome bisulfite sequencing data. F1000Research 2, 217 (2013).
Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).
Article CAS Google Scholar
Akalin, A. et al. methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol. 13, R87 (2012).
Article Google Scholar
Wang, K. et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 38, e178–e178 (2010).
Article Google Scholar
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinform. 12, 323 (2011).
Article CAS Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar

Download references

Acknowledgements

We thank Prof. Yoon-Kyung Cho for supporting this project. We also thank Korea University Anam Hospital members for helping source blood and information of participants. Korea Institute of Science and Technology Information (KISTI) provided us with the Korea Research Environment Open NETwork (KREONET). This work was supported by the Civil-Military Dual-Use Technology Development Program (14-BR-SS-03) through the Agency for Defense Development; U-K BRAND Research Fund (1.190007.01) of UNIST; Research Project Funded by Ulsan City Research Fund (1.190033.01) of UNIST; the Next-Generation Information Computing Development Program through the National Research Foundation of Korea funded by the Ministry of Science and ICT (NRF-2016M3C4A7952635).

Author information

These authors contributed equally: Youngjune Bhak, Hyoung-oh Jeong

Authors and Affiliations

Korean Genomics Industrialization and Commercialization Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea
Youngjune Bhak, Hyoung-oh Jeong, Sungwon Jeon, Juok Cho, Yeonsu Jeon, Asta Blazyte, Seung Gu Park, Hak-Min Kim, Jong Bhak & Semin Lee
Department of Biomedical Engineering, School of Life Sciences, UNIST, Ulsan, 44919, Republic of Korea
Youngjune Bhak, Hyoung-oh Jeong, Sungwon Jeon, Juok Cho, Yeonsu Jeon, Hak-Min Kim, Jong Bhak & Semin Lee
Clinomics Inc., Ulsan, 44919, Republic of Korea
Youngjune Bhak, Yun Sung Cho, Hak-Min Kim, Yumi Kim, Byung Chul Kim & Jong Bhak
Department of Transdisciplinary Studies, Graduate School of Convergence Science and Technology, Seoul National University, Suwon, 16229, Republic of Korea
Jeong-An Gim
Division of Cardiology, Department of Internal Medicine, Ulsan Medical Center, Ulsan, Republic of Korea
Eun-Seok Shin
Department of Neuropsychiatry, College of Medicine, Kyung Hee University, Seoul, Republic of Korea
Jong-Woo Paik
Department of Psychiatry, Seoul Medical Center, Seoul, Republic of Korea
Hae-Woo Lee
Department of Biomedical Sciences, Korea University College of Medicine, Seoul, Republic of Korea
Wooyoung Kang, Aram Kim & Byung-Joo Ham
Department of Psychiatry, Korea University Anam Hospital, Korea University College of Medicine, Seoul, Republic of Korea
Byung-Joo Ham
Brain Convergence Research Center, Korea University Anam Hospital, Seoul, Republic of Korea
Byung-Joo Ham
Personal Genomics Institute, Genome Research Foundation, Cheongju, 28160, Republic of Korea
Jong Bhak

Authors

Youngjune Bhak
View author publications
You can also search for this author in PubMed Google Scholar
Hyoung-oh Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Yun Sung Cho
View author publications
You can also search for this author in PubMed Google Scholar
Sungwon Jeon
View author publications
You can also search for this author in PubMed Google Scholar
Juok Cho
View author publications
You can also search for this author in PubMed Google Scholar
Jeong-An Gim
View author publications
You can also search for this author in PubMed Google Scholar
Yeonsu Jeon
View author publications
You can also search for this author in PubMed Google Scholar
Asta Blazyte
View author publications
You can also search for this author in PubMed Google Scholar
Seung Gu Park
View author publications
You can also search for this author in PubMed Google Scholar
Hak-Min Kim
View author publications
You can also search for this author in PubMed Google Scholar
Eun-Seok Shin
View author publications
You can also search for this author in PubMed Google Scholar
Jong-Woo Paik
View author publications
You can also search for this author in PubMed Google Scholar
Hae-Woo Lee
View author publications
You can also search for this author in PubMed Google Scholar
Wooyoung Kang
View author publications
You can also search for this author in PubMed Google Scholar
Aram Kim
View author publications
You can also search for this author in PubMed Google Scholar
Yumi Kim
View author publications
You can also search for this author in PubMed Google Scholar
Byung Chul Kim
View author publications
You can also search for this author in PubMed Google Scholar
Byung-Joo Ham
View author publications
You can also search for this author in PubMed Google Scholar
Jong Bhak
View author publications
You can also search for this author in PubMed Google Scholar
Semin Lee
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.S.C., B.C.K., J.B. and S.L. designed and supervised the research and acquired the funds. A.K., Y.K., J.W.P., H.W.L., W.K. and B.J.H. managed patients’ blood samples and diagnostic assessment information. H.J., Y.B., H.M.K. and J.C. constructed the features from the raw sequencing data and the participant information. H.J. selected the features and developed the models. H.J., Y.B., S.J., J.A.G., Y.J., A.B., S.G.P. and E.S.S. interpreted results. Y.B., H.J., J.B. and S.L. wrote the paper with significant contributions from all authors. All authors discussed the method and result.

Corresponding authors

Correspondence to Jong Bhak or Semin Lee.

Ethics declarations

Conflict of interest

H.J., Y.B., Y.S.C., B.C.K., J.B. and S.L. are listed as inventors on a patent application related to the work. Y.B., H.M.K., Y.S.C. and Y.K. are employees, E.S.S. is a chief medical officer, J.B. and B.C.K. are chief executive officers, and S.L. is on the scientific advisory board of Clinomics Inc. H.M.K., Y.S.C., J.B. and B.C.K. have an equity interest in the company. Those do not alter our adherence to Translational Pychiatry policies on sharing and materials

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary tables's label

Table S1. Baseline sample characteristics (sample by sample)

Table S2. List of selected markers for SA vs MDD classifier model

Table S3. List of selected markers for MDD vs control classifier model

Table S4. List of selected markers for SA vs. control classifier model

Table S5. List of selected markers for HAM17 regression model

Table S6. List of selected markers for SSI regression model

Table S7. Enrichment analysis result of the models' marker

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bhak, Y., Jeong, Ho., Cho, Y.S. et al. Depression and suicide risk prediction models using blood-derived multi-omics data. Transl Psychiatry 9, 262 (2019). https://doi.org/10.1038/s41398-019-0595-2

Download citation

Received: 09 June 2019
Revised: 09 September 2019
Accepted: 24 September 2019
Published: 17 October 2019
DOI: https://doi.org/10.1038/s41398-019-0595-2

This article is cited by

Machine learning and the prediction of suicide in psychiatric populations: a systematic review
- Alessandro Pigoni
- Giuseppe Delvecchio
- Paolo Brambilla
Translational Psychiatry (2024)
Whole-genome sequencing analysis of suicide deaths integrating brain-regulatory eQTLs data to identify risk loci and genes
- Seonggyun Han
- Emily DiBlasi
- Hilary Coon
Molecular Psychiatry (2023)
Integrative Multi-omics Analysis of Childhood Aggressive Behavior
- Fiona A. Hagenbeek
- Jenny van Dongen
- Dorret I. Boomsma
Behavior Genetics (2023)
Predicting venous thromboembolism in hospitalized trauma patients: a combination of the Caprini score and data-driven machine learning model
- Lingxiao He
- Lei Luo
- Guanglin Wang
BMC Emergency Medicine (2021)
Gene expression profiling in peripheral blood lymphocytes for major depression: preliminary cues from Chinese discordant sib-pair study
- Chan Wu
- Danfeng Wang
- Hui Xiang
Translational Psychiatry (2021)