Coventry University Featured PhD Programmes
Engineering and Physical Sciences Research Council Featured PhD Programmes
Cardiff University Featured PhD Programmes

Biomedical knowledge discovery and triangulation using natural language processing


Bristol Medical School

Applications accepted all year round Self-Funded PhD Students Only

About the Project

We are offering an exciting data science project for potential PhD students interested in developing and applying artificial intelligence (AI) in the context of population health.

You will use natural language processing (NLP) methods to extract knowledge from the biomedical literature which identifies and/or quantifies the relationships between risk factors, intermediate traits and disease outcomes. You will then integrate this knowledge with relationships estimated using epidemiological approaches such as Mendelian randomization (MR), observational association studies (eg UK Biobank) and randomized controlled trials (RCTs) and with information from biological pathway databases.

Over 30 million publications are indexed in the PubMed database, representing a wealth of scientific knowledge on human health and disease. However, this largely unstructured data resource is difficult to systematically exploit, leading to potential duplication of effort and missed opportunities. Recent developments in natural language processing (NLP) are transforming our ability to extract knowledge from text using deep neural networks. In previous work we implemented a platform for identification of mechanistic intermediates between risk factors and diseases: http://www.melodi.biocompute.org.uk

Our approach attempts to find semantic triples (knowledge about the relationships between two entities, for example a risk factor and disease), and connect these into mechanistic pathways. However, there is substantial scope to expand this work using new methods and data.

The objectives of this project include: 1. Develop and apply new approaches to identifying risk factors, diseases and the relationship between them from biomedical text; 2. Integrate newly derived knowledge with information from existing methods and other evidence to validate the approach; 3. Develop methods to triangulate evidence from the literature with information from pathway databases and causal analysis approaches. You will use NLP toolkits such as NLTK and Google BERT and will have access to high performance computing facilities at the University of Bristol, including a dedicated GPU cluster hosted by the MRC Integrative Epidemiology Unit.

This PhD project will be based Prof Tom Gaunt’s group http://www.biocompute.org.uk in the MRC Integrative Epidemiology Unit http://www.bristol.ac.uk/integrative-epidemiology/

Your work will contribute to the EpiGraphDB project: http://www.epigraphdb.org

Applications and enquiries are welcome at any time.

Email Now

Insert previous message below for editing? 
You haven’t included a message. Providing a specific message means universities will take your enquiry more seriously and helps them provide the information you need.
Why not add a message here

The information you submit to University of Bristol will only be used by them or their data partners to deal with your enquiry, according to their privacy notice. For more information on how we use and store your data, please read our privacy statement.

* required field

Your enquiry has been emailed successfully



Search Suggestions

Search Suggestions

Based on your current searches we recommend the following search filters.



FindAPhD. Copyright 2005-2020
All rights reserved.