University of Surrey Featured PhD Programmes
University of Bristol Featured PhD Programmes
University of Leeds Featured PhD Programmes

PhD In Statistics: Bayesian statistical data integration of single-cell and bulk “OMICS” datasets with clinical parameters for accurate prediction of treatment outcomes in Rheumatoid Arthritis

   College of Science and Engineering

This project is no longer listed on and may not be available.

Click here to search for PhD studentship opportunities
  Dr M Gupta, Dr Thomas Otto, Dr S Siebert  No more applications being accepted  Competition Funded PhD Project (UK Students Only)

Glasgow United Kingdom Bioinformatics Data Analysis Evolution Genetics Computer Science Software Engineering Statistics

About the Project

In recent years, many different computational methods to analyse biological data have been established: including DNA (Genomics), RNA (Transcriptomics), Proteins (proteomics) and Metabolomics, that captures more dynamic events. These methods were refined by the advent of single cell technology, where it is now possible to capture the transcriptomics profile of single cells, spatial arrangements of cells from flow methods or imaging methods like functional magnetic resonance imaging. At the same time, these OMICS data can be complemented with clinical data – measurement of patients, like age, smoking status, phenotype of disease or drug treatment. It is an interesting and important open statistical question how to combine data from different “modalities” (like transcriptome with clinical data or imaging data) in a statistically valid way, to compare different datasets and make justifiable statistical inferences.

In this PhD project, jointly supervised with Dr. Mayetri Gupta (Statistics), Dr. Thomas Otto and Prof. Stefan Siebert (Institute of Infection, Immunity & Inflammation), you will explore how to combine different datasets using Bayesian latent variable modelling, focusing on clinical datasets from Rheumatoid Arthritis. Single cell data has been generated from rheumatoid arthritis patients[1] from synovial and blood samples. This will be combined with rich clinical datasets from cohorts like SERA[2], RAMAP[3] and others, that all have transcriptomics data (bulk RNA-Seq from blood and some tissues). All these datasets are already curated and stored in a tranSMART database. In the different datasets, the patients were treated with different drugs and their response or the lack of it, was recorded over time.

Our overall aim is to build a Bayesian statistical framework and methodology that can combine these different data types in a latent space in a statistically justifiable way, with the goal of more accurate prediction of clinical outcomes than can be achieved with a single (or fewer types of) dataset alone. The secondary aim is to develop robust and efficient Bayesian computational methodologies to fit these models on ultra-high-dimensional, complex datasets to make valid inferences, build user-friendly, publicly available computational software (in R) implementing these methods, and compare them to other currently available computational tools, both in simulated and real datasets.

Some questions of interest are: (1) determining if it is possible to differentiate from the single cell data the different phenotypes (active RA, remission) in the clinical data; (2) explore if in the latent space, it is possible to combine the different modalities when including further datasets from the IMID-Bio-UK dataset as well as imaging data; (3) exploring our methods in the context of Rheumatoid Arthritis with Psoriasis Arthritis- which are two immune mediated inflammatory diseases with distinct pathways but also similarities- can our proposed methods (a) confirm existing findings (b) highlight novel shared signatures between the two diseases?

Eligibility: First/Upper 2nd Class undergraduate Honours degree in Statistics, Computer Science, or a related field.

The successful candidate should have a strong training and background in theoretical, methodological and applied Statistics, expert skills in relevant statistical software or programming languages (such as R, Python/C/C++, or MATLab), and also have a deep interest in developing knowledge in cross-disciplinary topics in genomics, sequencing technology, and inflammatory disease, during the PhD. The candidate will be expected to consolidate and master an extensive range of topics in modern Statistical theory and applications during their PhD, including advanced Bayesian modelling and computation, latent variable models, machine learning, and methods for Big Data. The candidate is also expected to have excellent interpersonal and communication skills (oral and written) and to be enthusiastic and comfortable interacting and communicating with researchers in other disciplines, especially in biology and medicine.

How to Apply

Please refer to the following website for details on how to apply:

Although you will need to apply through the university portal, please also additionally send your application (including cover letter, CV) to [Email Address Removed]

Funding Notes

Funding is available to cover domestic tuition fees for applicants for 4 years, as well as paying a stipend at the Research Council rate (estimated 15,609, for Session 2021-22


1. Alivernini S, MacDonald L, Elmesmari A, Finlay S, Tolusso B, Gigante MR, Petricca L, Di Mario C, Bui L, Perniola S et al: Distinct synovial tissue macrophage subsets regulate inflammation and remission in rheumatoid arthritis. Nat Med 2020, 26(8):1295-1306.
2. Dale J, Paterson C, Tierney A, Ralston SH, Reid DM, Basu N, Harvie J, McKay ND, Saunders S, Wilson H et al: The Scottish Early Rheumatoid Arthritis (SERA) Study: an inception cohort and biobank. BMC Musculoskelet Disord 2016, 17(1):461.
3. Cope AP, Barnes MR, Belson A, Binks M, Brockbank S, Bonachela-Capdevila F, Carini C, Fisher BA, Goodyear CS, Emery P et al: The RA-MAP Consortium: a working model for academia-industry collaboration. Nat Rev Rheumatol 2018, 14(1):53-60.