Bioinformatics Approaches for the Post-GWAS Analysis of Disease Susceptibility Loci

UoM administered thesis: Master of Philosophy

Abstract

Introduction: Genome Wide Association Studies (GWAS) have been used extensively to identify common variations associated with disease and have enormous potential to identify key pathways responsible for disease pathogenesis. This will in turn lead to insight into common and disease specific processes and could determine why there is a difference in disease pathogenesis and treatment response between patient sub groups. Since this field is still in its infancy and has no clear validated workflow, the aim of the project was to produce a post-GWAS workflow which can be applied to known associations to implicate novel pathways and genes. This involved assessing the bioinformatic pathway tools available to design an automated workflow which would select candidate genes in regions and test its effectiveness using rheumatoid arthritis (RA) as a model disease. Methods: Using the Taverna workbench, a robust workflow has been developed to define a region represented by an associated SNP by utilising existing knowledge of the region such as linkage disequilibrium (LD) and recombination hotspots. Using RA as a model, the workflow was used to identify the full extent of the associated regions and identify all the genes implicated in these regions. Pathway enrichment and protein-protein interaction analyses were performed to identify potential pathways or interactions associated with the pathogenesis of the disease. Results: Of the 58 SNPs associated with RA, the workflow successfully defined associated regions for 55 SNPs and identified a total of 436 genes representing 54 associated loci. All regions identified by the workflow contained the most biologically plausible genes with the exception of five SNPs. The pathway enrichment analyses identified many immunological pathways including antigen processing and presentation, immune regulation and signalling. Protein-protein interaction analyses identified genes acting as hubs of interaction implicating many additional genes. Discussion: The Taverna workflow provides researchers with a simple, unbiased and robust tool to assign genes to SNP association signals. Although the workflow identified many of the genes as those originally assigned by researchers, it also identified potential interesting candidates, such as PTPN11, in an unbiased manner. Additionally there is now evidence which implicates new loci, such as the IL6ST and ICOS genes.The pathway analyses highlighted multiple pathways which confirm the involvement of existing loci and explain many aspects of RA aetiology. Pathway and protein-protein interaction analyses emphasise the importance of many molecules central to the immune system and may well therefore be involved in disease.It is apparent that no one pathway database is the ideal source and results must be combined to produce an accurate picture of the pathways involved in the disease. Additionally, while further refinement and validation is necessary, this approach has identified novel pathways and implicated additional genes which may contribute to RA susceptibility or provide therapeutical targets.

Details

Original languageEnglish
Awarding Institution
Supervisors/Advisors
Award date31 Dec 2011