FREE Virtual Study Fair | 1 - 2 March | REGISTER NOW FREE Virtual Study Fair | 1 - 2 March | REGISTER NOW

EASTBIO Mining for the best side of bioscience data, with machine learning

   School of Biological Sciences

This project is no longer listed on and may not be available.

Click here to search for PhD studentship opportunities
  Prof A J Millar, Dr I Simpson, Dr J Swedlow  No more applications being accepted  Competition Funded PhD Project (Students Worldwide)

About the Project

Biology is being transformed by large-scale data sets, but how do you find the right one? The data needs a ‘dating profile’: sounds easy but it turns out to be a bottleneck that is holding back research progress and research culture. This project tackles that metadata bottleneck. We aim to help researchers to show the best side of their data.

Data-intensive bioscience depends upon online repositories that share the “Big Data”. There’s little value in sharing data, if you can’t tell which organism, sample, or conditions it came from, so the databases also need descriptions of the data, termed metadata. Top-tier repositories pay professional data curators to deal with their metadata but many other repositories cannot do so. Even curators can’t invent metadata, the original researchers have to describe their research for the curators.

This project first aims to understand current data descriptions in research data repositories, using text mining and machine learning, in particular named entity recognition in free-text descriptions. Based on this evidence, you will research the simplest ways to improve the descriptions in future. The project will test real-time feedback that encourages researchers to provide better descriptions, for example using controlled vocabularies. You will work with software developers to test and evaluate simple feedback processes, in practice, for biological data repositories. By then, you will also be an expert data steward.

Improving data descriptions will accelerate data-intensive bioscience across many research fields, as this bottleneck applies to many repositories and even electronic lab notebooks. Making the data easier to re-use will also reward the researchers who share their data, supporting the “Open Science” aspect of the new research culture.

The team: Andrew Millar (Edinburgh) and Jason Swedlow (Dundee), are biologists who also develop and run data repositories, and help researchers to manage their data. We have access to internationally-adopted repositories (e.g., their metadata, and to their software developers, who can help to implement feedback processes.

Ian Simpson (Edinburgh Informatics) applies natural language processing (text mining) software tools to analyse bioscience literature, and has worked on several “Big Data” bioscience projects, including with Andrew Millar.

The project is based with the BioRDM team in the C.H. Waddington building, at the focal point of SynthSys, the interdisciplinary biology research centre at the University of Edinburgh, where many labs generate, analyse and model large-scale biological data. More information at

Individual profile -

BioRDM team’s public wiki -

The School of Biological Sciences is committed to Equality & Diversity:

How to Apply

The “Institution Website” button will take you to our online Application Checklist. From here you can formally apply online. This checklist also provides a link to EASTBIO - how to apply web page. You must follow the Application Checklist and EASTBIO guidance carefully, in particular ensuring you complete all the EASTBIO requirements, and use /upload relevant EASTBIO forms to your online application. 

Funding Notes

This 4 year PhD project is part of a competition funded by EASTBIO BBSRC Doctoral Training Partnership
This opportunity is open to UK and International students and provides funding to cover stipend at UKRI standard rate (£17,668 annually in 2022) and UK level tuition fees. The fee difference will be covered by the University of Edinburgh for successful international applicants, however any Visa or Health Insurance costs are not covered. UKRI eligibility guidance: Terms and Conditions: International/EU:

How good is research at University of Edinburgh in Biological Sciences?

Research output data provided by the Research Excellence Framework (REF)

Click here to see the results for all UK universities

Open days

Search Suggestions
Search suggestions

Based on your current searches we recommend the following search filters.

PhD saved successfully
View saved PhDs