Genome wide association studies (GWAS) identify genotype–phenotype associations by testing large numbers of genetic variants across the genomes in individuals with a given phenotype, and has revolutionized the field of complex trait genetics. These studies have predominantly focused on common variants, many of which have small effects on the traits and are confounded by the correlation structure among variants. Rare variants can have larger effects and are less confounded by correlation structure, and so have the potential to identify causal genes more readily. However, current GWAS methodology is unsuited to detecting the effects of rare variants, and the generalizability and sometimes validity of results are affected by both local and global ancestry. Taken together, these factors limit our understanding of how genetic changes affect trait differences.
A recent breakthrough provides us with the tools to readily identify rare causal variants while taking into account local ancestry. The “succinct tree sequence” data structure is a transformative new technology that has led to performance increases of multiple orders of magnitude in genome simulation, data processing and ancestry inference. This revolutionary approach encodes genetic data in terms of the underlying genealogical trees, simultaneously capturing biological signal of unprecedented richness and providing an extremely efficient computational substrate.
In this project, we will use inferred genealogical trees for large datasets such as UK Biobank and: (1) develop statistical methodology and software for detecting signals of negative selection using the inferred trees; (2) use these methods to find variants that are under negative selection and have large effects on circulating metabolites in UK Biobank; (3) apply the methods to publicly available data to build a high-resolution genome-wide atlas of negative selection.
Attributes of suitable applicants:
Ideal candidates for this PhD studentship will have a 1st or 2.1 BSc in statistics, mathematics, computer science or biology, but candidates from any STEM degree will be considered. The project will entail a significant computational component, and therefore some experience in programming languages such as Python or R is required. Strong communication and collaboration skills and a willingness to work together with a team of scientists is essential.
Note:
This project is supported through the Oxford Interdisciplinary Bioscience Doctoral Training Partnership (DTP) studentship programme. The student recruited to this project will join a cohort of students enrolled in the DTP’s interdisciplinary training programme, and will participate in the training and networking opportunities available through the DTP. For further details, please visit www.biodtp.ox.ac.uk. The DTP and its associated partner organisations aim to create a community that is innovative, inclusive and collaborative, in which everyone feels valued, respected, and supported, and we encourage applications from a diverse range of qualified applicants.