SCALABLE GAUSSIAN PROCESS METHODS FOR SINGLE-CELL DATA

UoM administered thesis: Phd

  • Authors:
  • Sumon Ahmed

Abstract

The analysis of single-cell data creates the opportunity to examine the temporal dynamics of complex biological processes where the generation of time course experiments is challenging or technically impossible. One popular approach is to learn a lower dimensional manifold or trajectory through the data that captures major sources of variation in the data. Gene expression patterns can then be aligned through different lineages in the trajectory as smooth functions of pseudotime which promises to facilitate the identification of differentially expressed (DE) genes across trajectories. We briefly review some popular trajectory inference and downstream analysis methods along with their strengths and assumptions. We provide a brief overview of Gaussian process (GP) inference and describe how GPs can be used for dimensionality reduction and data association, which later facilitate probabilistic pseudotime estimation and downstream analysis to inferring DE genes and branching times. We present a scalable implementation of the Gaussian process latent variable model (GPLVM) and develop a pseudotime estimation method that scales to droplet-based large volume single-cell datasets and can be extended to higher dimensional latent spaces to capture other sources of variation such as branching dynamics. The model's efficacy is evaluated on a number of datasets from different organisms collected using different protocols. The model converges significantly faster compared to existing methods whilst achieving comparable estimation accuracy. We reimplement an existing downstream analysis method for identifying branching dynamics from bulk time series data and apply it on single-cell data after pseudotime inference, extending the models to model counts data. We also present the limitations of a recent approach to inference of branching dynamics in single-cell data and extend the model to mitigate its limitations. Our downstream analysis models are shown to successfully identify branching locations for individual genes when applied on simulated data and single-cell mouse haematopoietic stem cells (HSCs) data.

Details

Original languageEnglish
Awarding Institution
Supervisors/Advisors
Award date3 Jan 2020