Knowledge synthesis can be a slow and cumbersome process but is an essential tool for medical and public health policy-makers. Formal systematic reviews require rigid protocols and extensive human effort from trained professionals, whereas a massive volume of research evidence emerges every year. Recent advances in computational approaches in artificial intelligence (AI) and natural language processing (NLP), such as named entity recognition and text summarization powered by large language models (LLMs, such as BERT or GPT-4), have substantially improved the efficiency and accuracy of information extraction at a massive scale. These AI and NLP methods could potentially offer the opportunity to complement manual systematic reviews with automated knowledge synthesis reports that are updated in real-time as new literature emerges, and therefore improve the efficiency and responsiveness of decision-making of systematic reviewers, public health professionals and clinicians.
This project aims to substantially improve the rapid review of literature by developing methods to automate knowledge synthesis from published research articles. The student will explore use of language models and NLP methods to automate: 1) the identification and extraction of key research information, and 2) the assessment of research quality and risk of bias from a published or pre-print article.
Objective 1: To investigate the performance of various automated approaches for text summarization. The student will evaluate and compare several NLP methods including rule-based approaches, small-scale language models as well as LLMs for the summarization of literature text as available from research article databases/online sources (e.g. PubMed, and preprint servers such as medRxiv) regarding the overall research objectives, key findings etc. The student will also evaluate the respective functionalities and performance, bias, as well as their potential environmental impact from these approaches. This objective aims to train the student regarding the current widely-used methods and underlying biomedical literature data, as well as to further lead to the adequate combination and balance between multiple methods for subsequent objectives. The student will be able to define their own evaluation criteria after consultation with stakeholders and experts.
Objective 2: To develop efficient and robust methods to extract and harmonize structural information from the literature text. The student will develop methods to identify and extract key information from the scientific text regarding its research and findings, such as the involved objects / subjects, research methods and study designs, as well as quantitative information such as effect sizes, etc. In addition, the student will integrate methods combining rule-based approaches and LLMs to coherently harmonize the extracted individual information across heterogenous study types (such as randomized trials, non-randomized interventions, observational studies) to enable comprehensive triangulation of evidence. The student will have the opportunity to decide on the details of information extraction and integration of methods.
Objective 3: To develop a novel framework to automate the assessment of risk of bias from research articles. Risk of bias refers to the risk that the research findings reported by a study are inaccurate due to limitations of the study, such as selection bias, measurement error or missing information. The student will build on methods developed in Obj. 2 to robustly assess the risk of bias of a study, and automate such assessment in published and pre-print research articles as well as contrast and validate the results from their developed methods with established frameworks (e.g. RobotReviewer, ROBINS-E, etc.). The student will steer the choice of methods, which could involve developing new components or repurposing existing ones.
The student will be primarily based at Programme 3 Data Mining Epidemiological Relationships of MRC Integrative Epidemiology Unit (IEU) in Bristol Medical School at the University of Bristol, and will also have cross-institute support by the CardiffNLP lab at Cardiff University as well as AMPLIFY. MRC IEU is a leading centre for research into methods for causal inference, and evidence triangulation as well as a leading centre for the application of causal methods for the investigation regarding diseases in population. Research in DMER focuses on addressing epidemiological research questions via computational approaches in health data science, including systematic causal analysis as well as machine learning and natural language processing. As part of the CardiffNLP lab, the student will have access to a vibrant environment that excels in NLP, including weekly seminars, and organize other activities like hackathons and a summer NLP workshop. In addition to the research and student training support by Bristol Medical School and the School of Computer Science and Informatics at Cardiff, the student will be able to have industry placement opportunities for machine learning, natural language processing and AI training and experiences at AMPLIFY.
Supervisors:
Dr Yi Liu, Bristol Medical School, University of Bristol
Dr Louise Millard, Bristol Medical School, University of Bristol
Dr Luis Espinosa-Anke, School of Computer Science and Informatics, Cardiff University
Professor Tom Gaunt, Bristol Medical School, University of Bristol
How to apply:
Please read the full application guidelines before applying.
This project is part of the GW4 BioMed2 MRC DTP projects.
Please complete an application to the GW4 BioMed2 MRC DTP for an ‘offer of funding’. If successful, you will also need to make an application for an 'offer to study' to your chosen institution.
Please complete the online application form linked from our website by 5.00pm on Wednesday, 1st November 2023. If you are shortlisted for interview, you will be notified from Tuesday 19th December 2023. Interviews will be held virtually on 24th and 25th January 2024. Studentships will start on 1st October 2024.
Enquiries:
For enquiries regarding the application procedures please contact [Email Address Removed].
For enquiries specifically on this project please contact Dr Yi Liu yi6240.liu<at>bristol.ac.uk