In the middle of applying to universities? | SHARE YOUR EXPERIENCE In the middle of applying to universities? | SHARE YOUR EXPERIENCE

Predicting zoonotic disease dynamics from digital archive records

   Cardiff School of Computer Science & Informatics

About the Project

Over 60% of human diseases have their origin traced to wildlife. These zoonotic diseases represent a significant threat to global human health, wildlife health, food security and economic growth, and understanding where, when and why they emerge is a crucial aspect of disease control. Our understanding of these dynamics and the mechanisms underlying their emergence, however, is focused on a handful of newly emerged zoonoses, mostly due to data paucity. 

Historic and recent records of biological specimens constitute an immense and, to a large extent, unexploited source of knowledge of species occurrences dating back more than two hundred years. The Global Biodiversity Information Facility (GBIF), for example, has nearly two billion digital records that include species names including parasites, the date of recording the specimen, as well as descriptions of their location, and in some cases habitat notes on the associated flora, fauna and environment. Recent developments in Natural Language Processing (NLP) have provided the potential to develop automated methods to georeference the textual location descriptions and to detect the presence of references to disease or disease symptoms and of specific habitat descriptors.  

The studentship will use a computer science approach to horizon scanning for zoonotic disease by applying NLP models and AI methods to written historic and current records such as those maintained by the Natural History Museum, London. We will assess the possibility to georeference disease in species and assess the environmental correlates associated with disease. Historical data will allow us to assess whether emerging zoonoses are indeed novel, or well established, if disease has been lost over time, or other dynamics occur. 

Brief methods: For purposes of georeferencing, in the case of simpler descriptions of locations this studentship will aim to apply existing techniques for detecting place names and geocoding them (converting to coordinates) that requires applying AI-based disambiguation methods. For location descriptions that employ complex phrasing with possibly multiple spatial relationships (e.g. near, north of, X distance from, on the edge of) we will collaborate with the New Zealand Biowhere project, which is developing methods for interpreting such descriptions. To exploit habitat descriptions in understanding environmental drivers we will analyse the evidence of individual descriptive texts, and of co-existing species, in combination with applying machine learning methods to train a classifier to detect particular types of habitat on the basis of mixed forms of textual description. To ground truth the model recent data, including the digital UK Landcover Maps from the Centre for Ecology and Hydrology, will be used. Our approach is analogous to previous methods demonstrating that textual sources can be used to infer aspects of the natural environment. The effectiveness of such machine learning approaches has been shown to benefit from supplementing the textual sources with conventional environmental datasets (e.g. soil type, elevation) provided they relate to the relevant time period. We also expect to improve predictions based on the textual inputs, by applying deep learning methods, such as the BERT-based and related state of the art transformer models.  

Summary aims: 

  • Identify occurrence of disease in animals and habitat types in which they occur through written records, for example, digital collections of museums.  
  • Georeference disease occurrence and, if possible, their associated habitat. 
  • Determine correlates between disease emergence and habitat type/change.  
  • Reconstruct the dynamics of present-day wildlife diseases. 

The interdisciplinary supervision team:

Prof. Christopher B. Jones is a Professor of Geographical Information Systems in the Cardiff School of Computer Science and Informatics. He specialises in geographical information retrieval and works on machine learning and natural language processing methods for georeferencing and environmental information extraction from text descriptions of location and habitat.  

Dr. Sarah Perkins is a Reader in the ecology wildlife diseases and brings expertise on ecological drivers of infectious disease and spatial and temporal analysis of disease dynamics. Perkins will provide training in ecological analysis (statistics). 

Dr. James Rudge is Associate Professor in Infectious Disease Epidemiology at LSHTM, and will provide inputs on mathematical modelling and other approaches in the study of multi-host disease dynamics (including macro-parasites, e.g. schistosomes) and their ecological drivers.

Laurence Livermore is innovations manager at The Natural History Museum (NHM) with specialisation in digitisation of specimen collection records and biodiversity informatics.

Studentship information 

The studentship will commence in October 2023 and will cover tuition fees (for both UK and international applicants) as well as providing a maintenance grant. In 2023 the maintenance grant for full-time students was a minimum stipend of £17,668 per annum. As well as tuition fees and a maintenance grant, all students receive access to OneZoo training and additional courses offered by the University’s Doctoral Academy and become members of the University Doctoral Academy

A very high standard of applications is typically received, the successful applicant is likely to have a very good first degree (a First or Upper Second class BSc Honours or equivalent) and/or be distinguished by having relevant research experience.

We welcome applicants with strong Computer Science skills. Ecological knowledge is advantageous, although not essential. The studentship will be based across both the School of Computer Science and the School of Biosciences at Cardiff University, with research visits to the Natural History Museum, London.

How to apply:

You can apply online - consideration is automatic on applying for a PhD with an October 2023 start date.

Please use our online application service at:

and specify in the funding section that you wish to be considered for UKRI OneZoo funding.

Please specify that you are applying for this particular project and name the supervisor.

Information on the application process can be found here

We welcome applicants with strong Computer Science skills. Ecological knowledge is advantageous, although not essential. The studentship will be based across both the School of Computer Science and the School of Biosciences at Cardiff University, with research visits to the Natural History Museum, London.

If not successful in being shortlisted for this particular studentship you could be considered for other studentships within the OneZoo program, please see the full list here:

Funding Notes

This studentship is open to Home, EU or international students. The award offered will cover Home, EU or international fees and a maintenance stipend. International/EU candidates are welcomed.
Please note we are limited to 6 studentships available for international/EU applicants that can cover full fees
Number of Studentships:
Approx 20 across the OneZoo program (View Website), with one studentship per individual project.
Studentship Duration:
3.5 years full time or up to 7 years part time funding.


Application deadline
1st May 2023 with interviews (either in person or online) being held on or around end of May and decisions being made by June 2023 for a 1st Oct 2023 start.
You must also by 1 May 2023 send the following to (title of the email must include the name of the host institution to which you are applying, and the surname of the principal supervisor) e.g. "Cardiff_Cable"
Completed CDT application Form - available to download here
Completed Equal Opportunities Form - available to download here
2 page CV
Copy of passport photo page

How good is research at Cardiff University in Computer Science and Informatics?

Research output data provided by the Research Excellence Framework (REF)

Click here to see the results for all UK universities

Email Now

Search Suggestions
Search suggestions

Based on your current searches we recommend the following search filters.

PhD saved successfully
View saved PhDs