Holdsworth-Carson SJ, Fung JN, Luong HT, Sapkota Y, Bowdler LM, Wallace L, Teh WT, Powell JE, Girling JE, Healey M, Montgomery GW, Rogers PA. Hum Reprod.
Hypertension reduces soluble guanylyl cyclase expression in the mouse aorta via the Notch signaling pathway. The training and test datasets for the convolutional neural network training were created analogously to the approach used in Basset (Kelley et al., 2016). Seven out of 19 ML models for post-GWAS prioritization curated in this review (Table 1) are ensemble models, namely random forests and gradient boosting. An optimal model also hinges on data size and quality for reliability and performance, with studies varying in data size and choice of features – from using hundreds of selected features (Isakov et al., 2017) to others exploring tens of thousands (Deo et al., 2014).
Increased amyloid beta-peptide deposition in cerebral cortex as a consequence of apolipoprotein E genotype in late-onset Alzheimer disease. GWAS never directly link variants to regulatory mechanisms. Epub 2016 Mar 22. Hierarchical bayes prioritization of marker associations from a genome-wide association scan for further investigation. By jointly analyzing SNPs, DeepWAS considers the correlation of each SNP with the phenotype, conditional on all other relevant SNP within an FU. For example, 23% of the 53 MS-specific dSNPs were previously identified in the ISMGC MS GWAS including more than 135,000 individuals (n>47,000 MS cases). (B) MDD-specific three-way QTL interaction network generated by using a graph database and highlighting only the dSNPs with eQTL and meQTL effects that also harbor an eQTM. For example, several studies have reported significant enrichment of T2D GWAS variants within pancreatic islet enhancer regions (Parker et al., 2013; Pasquali et al., 2014), with that enrichment particularly concentrated in subsets of islet enhancers characterised by open chromatin and hypomethylation (Thurner et al., 2018), and clustered in 3D enhancer hub structures (Miguel-Escalada et al., 2018). Deep Learning Specialization on Coursera. Code used to generate the results of this study is available at https://github.com/agawes/islet_CNN. Both strategies prevent the invasion of the other through interference competition, creating evolutionary bi-stability. -, Tak YG, Farnham PJ. The gene MAZ on chromosome 16, for example, has been previously identified as a genome-wide significant GWAS locus for MS [14]. We fully acknowledge reviewers comment that pancreatic islets are indeed a mixture of cell types, and while function of the insulin-producing beta cells is imperative to blood glucose control and T2D aetiology, there is emerging evidence that other cell types may contribute as well. They found that integration of eQTL data with GWAS data provided an overlap of information between the two that strengthened model performance. (2014) used known causal genes as their training examples for cardiomyopathies, unlike the use of GWAS associated genes in the training data for other phenotypes, implying the benefit of using well-curated input data.
The varying performance of SVM also highlights the importance of input data, as Kafaie et al.
In fact, in all three DeepWAS, dSNPs were identified in cell types and enhancers previously shown to be relevant for the tested phenotype. doi: 10.1038/s41435-019-0059-y, Giri, A., Hellwege, J. N., Keaton, J. M., Park, J., Qiu, C., Warren, H. R., et al. How that data is collected and recorded then also affects the reliability of ML methods and comparison of model performances. Machine learning (ML) techniques offer an opportunity to dissect the heterogeneity of variant and gene signals in the post-GWAS analysis phase. ENCODE cell type information was downloaded from https://genome.ucsc.edu/encode/cellTypes.html and tissue categories were extracted from the column “tissue”. 01GI0916 and 01GI0917). The stability selection method provides a robust feature selection by taking the uncertainty of feature selection into account, using subsets or bootstrapped subsamples of data sets. Genotype data was generated for each cohort individually, see Table 1.
Supervision, Using hyperSMURF they prioritized thousands of GWAS variants annotated with 1,842 features. [36], relating MEF2 to activity-dependent dendritic spine growth and suggesting that this TF may suppress memory formation.
Recruitment strategies and further characterization have been described previously [16]. For example, 47% of the MS-specific dSNPs (n = 35) affected the binding of chromatin features in hematopoetic tissue, and another 30% affected chromatin features in brain tissue or spinal cord (n = 16; Fig 3B).
For a subset of the MDDC cohort (n = 166 recMDD cases), genomic DNA was extracted from whole blood using the Gentra Puregene Blood Kit (QIAGEN). This was performed on a tissue-specific basis of over 200 tissues (Zhou et al., 2018), providing hundreds of features for the model to process. Finally, we reasoned that variants predicted to affect function of specific regulatory elements would be more likely to reside within them (e.g.
Research can develop models aiming to be applied across diseases, and re-used by other researchers, with consideration for the size of present GWAS data, varying datatypes, and feature importance. ChromHMM used five core marks (H3K4me3, H3K4me1, H3K36me3, H3K27me3, H3K9me3) from each of the 111 reference epigenomes and learned a set of 15 chromatin state definitions per genomic segment.
QTL network analyses helped us to identify the SNPs that showed joint effects on epigenetic and transcriptomic levels, i.e., dSNP = eSNP = meSNP where the dSNP harbors an eQTM. Dots represent KKNMS GWAS p-values and the diamond shows the IMSGC GWAS signal p-value.
p3@iastate.edu (2017). For example, random forests provide feature importance measures and have been investigated by Szymczak et al. The heatmaps show the number of selected chromatin features vs. cell line for (A) MDD and (B) height dSNPs. In comparison, unlike other computational methods, the choices ML models make for prioritization are not always clearly available to be understood by the user. SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS. For more information about PLOS Subject Areas, click The dSNP rs11000015 was correlated with expression levels of the Prosaposin (PSAP) gene in multiple tissues, whole blood gene expression levels of PSAP are shown in Fig 5C. We next wanted to illustrate how DeepWAS can accelerate the discovery of disease mechanisms.
MS cohorts, referred to as DE1 and DE2, were analyzed.
In comparison to the published GWAS-based results, DeepWAS adds the novel and testable hypothesis that the TFs MafF and MafK contribute to MS susceptibility. Features marked as ‘a_”, ‘b_”,”e_”, and ‘aci_”, were assayed in FACS-sorted cell populations rather than whole pancreatic islets, and correspond to alpha cells, beta cells, exocrine cells, and acinar cells, respectively (Bramswig et al., 2013; Ackermann et al., 2016). Promoter regions, DNA accessibility and binding of specific transcription factors proved the easiest to predict, as these features are characterized by very distinct sequence motifs that can be identified by the network’s convolutional filters. This resulted in 505,273 genomic intervals of 1000 bp length with assigned presence/absence of the 30 islet epigenomic features, with an average of 2.62 chromatin features per interval.
While we tested DeepWAS in small and medium-size samples and observed a potential increase in power in detecting phenotype-relevant functional SNPs, applying this method to very large data sets will be even more informative. Rep. 8:15050. doi: 10.1038/s41598-018-33420-z, Breiman, L. (2001). On prioritization they found highly ranked variants were also most likely to be replicated across GWAS. In-depth investigation of the wealth of additional regulatory capacities of dSNPs were carried out by generating QTL networks that combine all pairwise links of meQTL (SNP-CpG), eQTL (SNP-gene), eQTM (CpG-gene), and dSNP-FU information. NeuroCure Clinical Research Center, Department of Neurology, and Experimental and Clinical Research Center, Max Delbrück Center for Molecular Medicine, and Charitϩ –Universitätsmedizin Berlin, Berlin, Germany, Affiliations GWAS never directly link variants to regulatory mechanisms.
This prioritization has aligned with experimental work recently focusing on GSDMB in IBDs, finding an increase in the gene’s expression may have a developmental role for IBDs (Rana and Pizarro, 2019).
Indeed, we observed that the enrichment of islet CNN-regulatory variants was more marked within the top ranks of insulin secretion signals. bioRxiv[Preprint], Wang, Y., Goh, W., Wong, L., Montana, G., and Alzheimer’s Disease Neuroimaging Initiative, (2013). Combining multiple hypothesis testing with machine learning increases the statistical power of genome-wide association studies. While DeepWAS can be used to predict the phenotype from the genotype, it is also interesting to annotate the relationship of FUs to a disease or trait. Also known as the true negative rate or selectivity. “stabsel” function from the stabs R package was used to identify significant trait associations (dSNPs). However, going from pure association to mechanistic insight has been a much more challenging task.
Data analysis was performed using MatrixEQTL and significant cis-eQTLs were filtered at an FDR of 5%. Given there is an abundance of 'pure' cell types with this sort of data available (for example, in ENCODE), it would have utility to 1) run in that setting first to demonstrate the optimal power of this approach for a relevant disease with an abundance of GWAS hits, and then 2) understand what the 'cost' is if you subsequently run in a mixed cell setting like the one they delineate.
Sphingolipids are the main components of nervous tissue and have been previously linked to MS [33].
The KORA study was initiated and financed by the Helmholtz Zentrum München-German Research Center for Environmental Health, which is funded by the BMBF and by the State of Bavaria. USA.gov. (B): DeepWAS was applied to 36,409 regulatory SNPs that were retained after filtering for allele-specific effects in any given FU. There is mounting evidence that disease-associated variants are likely to perturb genes and regulatory modules that are of specific importance within disease-relevant cell types or tissues (Marbach et al., 2016; Battle et al., 2017). Writing – review & editing, * E-mail: nikola.mueller@helmholtz-muenchen.de. Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany,
The convolutional filters in the first CNN layer detect repeatedly occurring local sequence patterns, which increase the prediction accuracy. 01GI0916 and 01GI0917). Bioinformatics 32, 542–548. At this signal, both variants were scored as potentially regulatory by CNNs with q < 0.05, but the q-value at rs17712208 was much lower (q = 1.69e-160) highlighting this variant as the more likely regulatory candidate in the islets. They found 63 out of 82 experimentally tested variants had a significant splicing impact in multiple cell lines (Lin et al., 2019), suggesting further directions for functional study and validating the RegSNPs-Intron’s prioritization. The genomic sequences of the intervals were extracted from the hg19 human reference genome and encoded as one-hot code matrix, mapping the sequences into a 4-row binary matrix corresponding to the four DNA nucleotides at each position. GWAS never directly link variants to regulatory mechanisms. Fig 2. Mol. 03ZIK012). All rights reserved. Overall, we found that 28.8% of variants with gPPAs > = 0.8 had predicted regulatory effects with q < 0.05. We characterize the effects of interdomain flexibility on the promotion of DPO4–DNA (un)binding, which probably contributes to the ability of DPO4 to bypass DNA lesions, which is a known biological role of Y-family polymerases.