We are offering an exciting data science project for potential PhD students interested in developing and applying artificial intelligence (AI) in the context of population health.
You will use natural language processing (NLP) methods to extract knowledge from the biomedical literature which identifies and/or quantifies the relationships between risk factors, intermediate traits and disease outcomes. You will then integrate this knowledge with relationships estimated using epidemiological approaches such as Mendelian randomization (MR), observational association studies (eg UK Biobank) and randomized controlled trials (RCTs) and with information from biological pathway databases.
Over 30 million publications are indexed in the PubMed database, representing a wealth of scientific knowledge on human health and disease. However, this largely unstructured data resource is difficult to systematically exploit, leading to potential duplication of effort and missed opportunities. Recent developments in natural language processing (NLP) are transforming our ability to extract knowledge from text using deep neural networks. In previous work we implemented a platform for identification of mechanistic intermediates between risk factors and diseases: http://www.melodi.biocompute.org.uk
Our approach attempts to find semantic triples (knowledge about the relationships between two entities, for example a risk factor and disease), and connect these into mechanistic pathways. However, there is substantial scope to expand this work using new methods and data.
The objectives of this project include: 1. Develop and apply new approaches to identifying risk factors, diseases and the relationship between them from biomedical text; 2. Integrate newly derived knowledge with information from existing methods and other evidence to validate the approach; 3. Develop methods to triangulate evidence from the literature with information from pathway databases and causal analysis approaches. You will use NLP toolkits such as NLTK and Google BERT and will have access to high performance computing facilities at the University of Bristol, including a dedicated GPU cluster hosted by the MRC Integrative Epidemiology Unit.
This PhD project will be based Prof Tom Gaunt’s group http://www.biocompute.org.uk in the MRC Integrative Epidemiology Unit http://www.bristol.ac.uk/integrative-epidemiology/
Your work will contribute to the EpiGraphDB project: http://www.epigraphdb.org
Applications and enquiries are welcome at any time.