Using machine learning to predict cell-type specific effects of genetic variants which influence genome regulation

   Department of Brain Sciences

   Applications accepted all year round  Funded PhD Project (UK Students Only)

About the Project

UK Dementia Research Institute (UK DRI) Centre at Imperial

PhD Studentship in Machine Learning for Genomics

Imperial College London, White City Campus, London, UK

Applications are invited for a fully-funded (at home/UK fee rate) 3.5-year PhD studentship in the research group of Dr Nathan Skene at the UK Dementia Research Institute, commencing during 2024 for the project:

'Using machine learning to predict cell-type specific effects of genetic variants which influence genome regulation'

The project is based at Imperial’s White City Campus, the Hub for convergent research. The candidate will be part of the UK DRI’s Neurogenomics Lab under the supervision of the Principal Investigator (PI), Dr Nathan Skene.

Funding will include a tax-free bursary (currently £21,000 per year and rising annually) and tuition fees at the home/UK rate (currently £7,030 per year).

Dementia is the greatest health challenge of our century. 

To date, there is no way to prevent it or even slow its progression, and there is an urgent need to fill the knowledge gap in our basic understanding of the diseases that cause it. The UK Dementia Research Institute (UK DRI) is the biggest UK initiative driving forward research to fill this gap. We are a globally leading multidisciplinary research institute of investigating the spectrum of neurodegenerative disorders causing dementia, with research groups located at University College London, the University of Cambridge, Cardiff University, Edinburgh University, Imperial College London and King’s College London.

The laboratory

The Neurogenomics Lab, led by Dr Nathan Skene focuses on identifying the cell types, time points and regulatory mechanisms acted on by genetic variants associated with neurodegenerative diseases. The lab develops statistical methods to integrate single cell genomic data with genome-wide datasets on the genetic associations with brain disorders.


Alzheimer’s has a twin heritability of 79% indicating that genetics plays a significant role in the disorder. A major challenge in biology is to understand how genetic risk factors drive disease: because of the large number of genes now understood to be involved, neurodegenerative diseases are now considered ‘complex traits’. While genetic studies have only identified 29 variants which are genome-wide significantly associated with the disorder, it is now recognised that almost all variants in the genome will affect disease risk to some degree (given sufficient power). To understand complex diseases, we then need to have good predictions for the functional role of millions of genetic variants on hundreds of regulatory factors (e.g., transcription factors, histone modifiers etc.). Machine learning techniques, such as long short-term memory recurrent neural network and decision trees have shown promising results in their ability to do this.

The Project:

Using machine learning to predict cell-type specific effects of genetic variants which influence genome regulation. This PhD project is focused on using machine learning techniques to develop novel classifiers for predicting how changes in DNA sequences alter genomic regulatory features. Many regulatory proteins recognise particular DNA sequences known as motifs, for instance, EcoRI only binds to GAATTC. DNA sequences can be converted into a machine interpretable format, using one-hot encoding. The candidate will use publicly available and inhouse datasets of genomic regulatory features to train models. Machine learning techniques will be used to predict the cell-type specific regulatory effects of genetic variants. We will provide several true-positive datasets, wherein the effect of genetic mutations on particular regulatory features has been measured. These will form validation datasets to evaluate how well the trained classifier works. We are interested in how improvements in the machine learning approach (e.g., use of transfer learning, recurrent attentional networks, or graph convolution networks) can be used to improve upon existing methods. The candidate will use these techniques to identify causal pathways and candidate drug targets for neurodegenerative diseases.

The candidate will be encouraged to participate in the Turing Institute’s enrichment scheme and to build collaborations through the DEMON (Deep Dementia Phenotyping) network.

Studentship details and application process

Applicants should hold (or have attained by January 2024) a First Class or an Upper Second Class degree (or equivalent overseas qualification) in a quantitative discipline, such as mathematics, statistics, computer science or engineering. Imperial would normally expect successful applicants to hold or achieve a Master's degree in a related field. Prior experience with programming is essential, but no experience with biology is necessary. Experience using machine learning methods will be beneficial. Applicants who are interested in this topic and are interested in applying for the Doctoral Training Centre scholarship’s in AI4Health or MultiSci, or who have qualifications suitable for the President’s Scholarship, are encouraged to contact Dr Nathan Skene.

For informal enquiries please contact Dr Nathan Skene (). For application, please send a full CV (including confirmation of home fee status), and contact details for two academic referees to .

We regret that due to the large volume of applications received, we are only able to notify those shortlisted for interview. Applications will be considered from January 2024.

Biological Sciences (4) Mathematics (25)

Funding Notes

Funding will include a tax-free bursary (currently £21,000 per year and rising annually) and tuition fees at the home/UK rate (currently £7,030 per year).

Register your interest for this project