Post-GWAS Bioinformatics and Functional Analysis of Disease Susceptibility Loci

UoM administered thesis: Phd


Genome-wide association studies (GWAS) have been tremendously successful in identifying genetic variants associated with complex diseases, such as rheumatoid arthritis (RA). However, the majority of these associations lie outside traditional protein coding regions and do not necessarily represent the causal effect. Therefore, the challenges post-GWAS are to identify causal variants, link them to target genes and explore the functional mechanisms involved in disease. The aim of the work presented here is to use high level bioinformatics to help address these challenges.There is now an increasing amount of experimental data generated by several large consortia with the aim of characterising the non-coding regions of the human genome, which has the ability to refine and prioritise genetic associations. However, whilst being publicly available, manually mining and utilising it to full effect can be prohibitive. I developed an automated tool, ASSIMILATOR, which quickly and effectively facilitated the mining and rapid interpretation of this data, inferring the likely functional consequence of variants and informing further investigation. This was used in a large extended GWAS in RA which assessed the functional impact of associated variants at the 22q12 locus, showing evidence that they could affect gene regulation.Environmental factors, such as vitamin D, can also affect gene regulation, increasing the risk of disease but are generally not incorporated into most GWAS. Vitamin D deficiency is common in RA and can regulate genes through vitamin D response elements (VDREs). I interrogated a large, publicly available VDRE ChIP-Seq dataset using a permutation testing approach to test for VDRE enrichment in RA loci. This study was the first comprehensive analysis of VDREs and RA associated variants and showed that they are enriched for VDREs, suggesting an involvement of vitamin D in RA.Indeed, evidence suggests that disease associated variants effect gene regulation through enhancer elements. These can act over large distances through physical interactions. A newly developed technique, Capture Hi-C, was used to identify regions of the genome which physically interact with associated variants for four autoimmune diseases. This study showed the complex physical interactions between genetic elements, which could be mediated by regions associated with disease. This work is pivotal in fully characterising genetic associations and determining their effect on disease. Further work has re-defined the 6q23 locus, a region associated with multiple diseases, resulting in a major re-evaluation of the likely causal gene in RA from TNFAIP3 to IL20RA, a druggable target, illustrating the huge potential of this research. Furthermore, it has been used to study the genetic associations unique to multiple sclerosis in the same region, showing chromatin interactions which support previously implicated genes and identify novel candidates. This could help improve our understanding and treatment of the disease.Bioinformatics is fundamental to fully exploit new and existing datasets and has made many positive impacts on our understanding of complex disease. This empowers researchers to fully explore disease aetiology and to further the discovery of new therapies.


Original languageEnglish
Awarding Institution
Award date1 Aug 2017