Don't miss our weekly PhD newsletter | Sign up now Don't miss our weekly PhD newsletter | Sign up now

  Machine learning for tracing pathogens in the food chain

   Bristol Veterinary School

This project is no longer listed on and may not be available.

Click here to search for PhD studentship opportunities
  Prof Kristen Reyher, Dr Sion Bayliss  Applications accepted all year round  Self-Funded PhD Students Only

About the Project

Salmonella is the most important food chain pathogen globally[1], causing infection and mortality in farmed animals and gastrointestinal infection of humans via consumption of contaminated foodstuffs. Local outbreak investigations are confounded by the complexity of food production and trade networks, which distribute foodborne pathogens around the globe. Global food safety and public health agencies have begun to routinely utilise whole genome sequencing (WGS) for pathogen surveillance. WGS provides high-resolution genetic relatedness between disease isolates, allowing the identification of clusters of infections arising from a common source. Traditional approaches for inferring geographical source from WGS require significant bioinformatic skills and scale poorly. We recently established that machine learning (ML) is an effective tool for the geographical source attribution of Salmonella Enteritidis[2]. The project will build upon this work by utilising WGS surveillance datasets from across the food chain to build ML models for rapid and accurate source attribution of a range of Salmonella species. Improved source attribution provides scientists with a better understand the transmission networks and virulence potential of persistent outbreaks. Furthermore, these models will enhance disease management responses of farm- and hospital-associated Salmonella outbreaks, facilitating rapid and effective disease management across the breadth of the food chain.

Aims and objectives

a) Optimisation of ML approaches for bacterial genomics data

Identify effective ML methodologies for application to the known complexities[3] of bacterial genomics by comparing various state-of-the-art approaches.

b) Train source attribution models for Salmonella

Use datasets contributed by public health surveillance programmes to build ML models for prediction of host and geographical source of Salmonella spp. using the finding from (a).

c) Generating actionable outputs for epidemiologists

Co-produce knowledge with field epidemiologists from US Centre for Disease Control (CDC) and UK Health Security Agency (UKHSA) to translate the outcomes of (b) into human-interpretable outputs using explainable ML methodologies.


The project will innovate and refine upon recent work[2] to identify optimal ML frameworks for bacterial genomic data, optimise a hierarchical ML model to predict the source of outbreak isolates and generate human-readable outputs for epidemiologists. This will be achieved by processing raw WGS data into unitigs, a compact representation of bacterial genomes[4], before phylogenetically-aware resampling to address class imbalance. These data will then be used as inputs for comparison of classification accuracy using a range of ML (random forest, e-nets, support vector) and deep learning (neural network) classifiers. The student will develop interdisciplinary skills in a range of underserved key skill development areas including data science, machine learning and genomic bioinformatics.

Building on longstanding collaborations, the successful candidate will intern with epidemiologists at the UKHSA and the CDC to better understand the needs of public health experts. This will facilitate the development of project outputs into a functional tools to provide actionable information for epidemiologists and support public health decision making. Researchers with data science and ML expertise are exceedingly rare and will be essential for the future of public and animal health research to spearhead data-driven analytics for large-scale ‘omic-based health surveillance.

Apply for this project

This project will be based in Bristol Veterinary School.

Please contact [Email Address Removed] for further details on how to apply.

Apply now!

Biological Sciences (4) Computer Science (8) Mathematics (25) Medicine (26)


1) WHO. (2022). Factsheet: Non-typhoidal Salmonella.
2) SC Bayliss, RK Locke, C Jenkins, MA Chattaway, TJ Dallman, LA Cowley (2022) Hierarchical machine learning predicts geographical origin of Salmonella within four minutes of sequencing. MedRxiv. doi: 10.1101/2022.08.23.22279111
3) NE Wheeler (2019) Tracing outbreaks with machine learning. Nature Reviews Microbiology. 17, 269
4) M Jaillard, L Lima, M Tournoud, P Mahé, A van Belkum, V Lacroix, L Jacob. A fast and agnostic method for bacterial genome-wide association studies: bridging the gap between k-mers and genetic events. PLOS Genet. 14, e1007758 (2018).

Where will I study?

Search Suggestions
Search suggestions

Based on your current searches we recommend the following search filters.

 About the Project