Characterising viral populations with genome sequencing data

UoM administered thesis: Phd

  • Authors:
  • Bede Constantinides

Abstract

Viral populations exhibit extensive genomic diversity that may be be profiled with contemporary nucleotide sequencing instruments, offering insight into virus transmission and the evolution of clinically relevant phenotypes within intrapatient infections. Emerging understanding of the role of the microbiome in human health and disease has pushed routine study of viruses beyond that of specific pathogen populations to include prospecting entire viral communities. In the current absence of high fidelity long read sequencing, analyses of such data must begin with reconstructing the short sequence fragments generated by sequencing instruments through either alignment to an appropriate reference sequence or de novo fragment assembly. By consequence of their error prone polymerases, RNA viruses in particular can present intrapopulation diversity to an extent that is problematic for the application of either of these approaches. In this thesis I present novel computational methodologies for robust de novo and reference-guided inference of viral consensus sequences, together with methods for characterising both interspecific and intraspecific genomic variation within viral populations. Through development and application of a wavelet-based sequence comparison approach to simulated viral populations, we demonstrate that seeding alignments using reduced dimensionality sequence representations provides accuracy broadly comparable with—and in some circumstances exceeding—that of the most accurate available aligners. Separately, I present a methodology that identifies sample-specific preprocessing and assembly parameters for optimal reconstruction of specific viruses, together with a companion tool that enables visual exploration of metagenome assemblies with taxonomic annotations retrieved using web services. I also present Kindel, an efficient approach for inferring population consensus sequences from read alignments of arbitrary quality, that unlike existing approaches is able to reconcile both aligned insertions/deletions and leverage unaligned sequence context to locally reassemble discordant alignment regions of up 1.5x read length. I apply Kindel to the characterisation of highly diverse and difficult to assemble intrapatient hepatitis C virus populations, and propose a feature extraction and decomposition methodology for monitoring temporal shifts in the state of specific viral populations. Used in conjunction with the taxonomic annotation and filtering tool Tictax, these approaches represent efficient and composable tools for the reconstruction and exploration of viral genomic and metagenomic communities using a portable computer. In the final chapter I describe the application of these methodologies to profiling the nasopharyngeal viromes of a cohort of asthmatic children and healthy controls, identifying differences in viral composition indicative of microbial dysbiosis within the asthmatic virome.

Details

Original languageEnglish
Awarding Institution
Supervisors/Advisors
Award date1 Aug 2018